ASF JIRA
ASF JIRA
Displaying 1000 issues at 19/Mar/20 20:35.
Project Key Summary Issue Type Status Priority Resolution Assignee Reporter Creator Created Last Viewed Updated Resolved Affects Version/s Fix Version/s Component/s Due Date Votes Watchers Images Original Estimate Remaining Estimate Time Spent Work Ratio Sub-Tasks Linked Issues Environment Description Security Level Progress Σ Progress Σ Time Spent Σ Remaining Estimate Σ Original Estimate Labels Git Notification Mailing List Github Integration Git Repository Name Global Rank Git Repository Type Blog Administrator? Blogs - Admin for blog Blogs - Username Blogs - Email Address Docs Text Git Repository Import Path New-TLP-TLPName Blogs - New Blog Write Access Epic Colour Blogs - Existing Blog Name Enable Automatic Patch Review Attachment count Blog - New Blog PMC Epic Name Blog - New Blog Administrators Epic Status Blog - Write access Epic Link Change Category Bug Category Bugzilla - List of usernames Bugzilla - PMC Name Test and Documentation Plan Bugzilla - Email Notification Address Discovered By Blogs - Existing Blog Access Level Complexity Bugzilla - Project Name Severity Initial Confluence Contributors Space Name Space Description Space Key Sprint Rank (Obsolete) Project Machine Readable Info Review Patch? Flags Source Control Link Authors Development Reviewers Ignite Flags Date of First Response Github Integrations - Other Last public comment date Skill Level Affects version (Component) Backport to Version Fix version (Component) Skill Level Existing GitBox Approval Protected Branch GitHub Options Release Note Hadoop Flags Tags Bugzilla Id Level of effort Target Version/s Bug behavior facts Lucene Fields Github Integration - Triggers Workaround Bugzilla Id INFRA - Subversion Repository Path Testcase included Estimated Complexity Regression Review Date Evidence Of Use On World Wide Web Evidence Of Registration Epic/Theme Flagged External issue ID Priority Reproduced In Tags Since Version Reviewer External issue URL Hadoop Flags Issue & fix info Evidence Of Open Source Adoption Rank Severity Tester
ZooKeeper ZOOKEEPER-3764

Add High Availability Guarantee Into Docs

Improvement Open Trivial Unresolved Unassigned David Mollitor David Mollitor 19/Mar/20 16:26 19/Mar/20 20:35 19/Mar/20 16:26       documentation   0 1   {quote}
For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log.
* https://kafka.apache.org/documentation/
{quote}

Please add a similar statement to the ZK docs, to include the formula for calculating the maximum number of server failures.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 hours ago 0|z0cppc:
ZooKeeper ZOOKEEPER-3763

Restore ZKUtil.deleteRecursive in order to help compatibility of applications with 3.5 and 3.6

Wish Open Critical Unresolved Enrico Olivelli Enrico Olivelli Enrico Olivelli 18/Mar/20 14:25 19/Mar/20 15:27 19/Mar/20 07:45   3.6.0 3.6.1 java client   0 1 0 1200   In HerdDB project (https://github.com/diennea/herddb) we are using BookKeeper that in turn uses ZKUtil.deleteRecursive and we are not able to switch to ZooKeeper 3.6.0

This is the error:
java.lang.NoSuchMethodError: org.apache.zookeeper.ZKUtil.deleteRecursive(Lorg/apache/zookeeper/ZooKeeper;Ljava/lang/String;)V
Apart a fix BookKeeper (https://github.com/apache/bookkeeper/issues/2292), we should take into consideration to restore that signature (adding some dummy method that calls the new one) in order to ease adoption of ZooKeeper 3.6.x

In fact it is very common that an application uses multiple ZooKeeper based libraries, like HBase, BookKeeper, Pulsar, Kafka...and the user cannot upgrade to 3.6 until every other dependency is able to work with 3.6.0.

If the fix is easy, like in this case, it is worth to help the community
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 day ago 0|z0co8g:
ZooKeeper ZOOKEEPER-3762

Add Client/Server API to return available features

New Feature Open Major Unresolved Unassigned Jordan Zimmerman Jordan Zimmerman 18/Mar/20 12:04 19/Mar/20 11:52 19/Mar/20 03:35   3.6.0   c client, java client, server   0 2   Recent versions have introduced several new features/changes. Clients would benefit from an API that reports the feature set that a server instance supports. Something like (in Java):

{code}
public enum ServerFeatures {
TTL_NODES,
PERSISTENT_WATCHERS,
... etc ... full set of features TBD
}

// in ZooKeeper.java
public Collection<ServerFeatures> getServerFeatures() {
...
}
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 hours ago 0|z0co2g:
ZooKeeper ZOOKEEPER-3761

upgrade JLine jar dependency

Improvement Open Minor Unresolved Unassigned maoling maoling 17/Mar/20 21:51 19/Mar/20 11:52 18/Mar/20 06:30       server   0 2   h2. currently JLine used 2.11(May 19, 2013) which is too out-of-date, we need to upgrade it to the lastest one: 3.13.3 or 3.14.0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 day ago 0|z0cmts:
ZooKeeper ZOOKEEPER-3760

remove a useless throwing CliException

Bug Open Major Unresolved Unassigned Jinjiang Ling Jinjiang Ling 16/Mar/20 20:51 19/Mar/20 11:52 16/Mar/20 21:01   3.5.7       0 1 0 600   when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the function processCmd in ZooKeeperMain.java is just like blow
{code:java}
protected boolean processCmd(MyCommandOptions co) throws CliException, IOException, InterruptedException {
boolean watch = false;
try {
watch = processZKCmd(co);
exitCode = ExitCode.EXECUTION_FINISHED.getValue();
} catch (CliException ex) {
exitCode = ex.getExitCode();
System.err.println(ex.getMessage());
}
return watch;
}
{code}
it throws {color:#FF0000}CliException {color}which has been caught in the funciton, so I think it can be removed.
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 days ago 0|z0clew:
ZooKeeper ZOOKEEPER-3759

A way to configure the jmx rmi port

Bug Open Minor Unresolved Unassigned Agostino Sarubbo Agostino Sarubbo 16/Mar/20 10:45   16/Mar/20 10:45           0 1   The start script misses a way to configure a java_rmi port, see also:
https://issues.apache.org/jira/browse/KAFKA-8658
[https://github.com/apache/kafka/pull/7088/commits/d02e14da8752a08bfe4f837d1cfea2c7b51e07af]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 days ago 0|z0cku0:
ZooKeeper ZOOKEEPER-3758

Update from 3.5.7 to 3.6.0 does not work

Bug Resolved Major Fixed Mate Szalay-Beko Agostino Sarubbo Agostino Sarubbo 16/Mar/20 05:52 19/Mar/20 11:48 19/Mar/20 12:11 19/Mar/20 12:10 3.6.0 3.6.1 server   0 3 0 5400   Hello,
we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 to 3.6.0 but it does not work.

We got the following:
{code:java}
2020-03-16 10:40:45,514 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b00000004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b00000004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b00000004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b00000004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] - maxSessionTimeout set to 40000 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] - zookeeper.pathStats.slotCapacity = 60 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@112] - zookeeper.pathStats.slotDuration = 15 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@113] - zookeeper.pathStats.maxDepth = 6 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@114] - zookeeper.pathStats.initialDelay = 5 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@115] - zookeeper.pathStats.delay = 5 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@116] - zookeeper.pathStats.enabled = false 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1470] - The max bytes for all large requests are set t o 104857600 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1484] - The large request threshold is set to -1 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@329] - Created server with tickTime 2000 minSessionTim eout 4000 maxSessionTimeout 40000 clientPortListenBacklog -1 datadir /opt/loway/zookeeper/logs/version-2 snapdir /opt/loway/zookeeper/data/version-2 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):Follower@75] - FOLLOWING - LEADER ELECTION TOOK - 4 MS 2020-03-16 10:40:45,519 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] - Peer state changed: following - discovery 2020-03-16 10:40:46,521 [myid:1] - WARN  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):Follower@129] - Exception when following the leader java.lang.IllegalArgumentException        at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1314)        at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1202)        at java.util.concurrent.Executors.newFixedThreadPool(Executors.java:89)        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:275)        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:87)        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1455) 2020-03-16 10:40:46,521 [myid:1] - INFO  [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):Follower@292] - shutdown Follower{code}
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 hours ago 0|z0ckg0:
ZooKeeper ZOOKEEPER-3757

Transaction log sync can take 20+ seconds after leader election when there is a large snapCount

Bug Open Minor Unresolved Unassigned Alex Kaiser Alex Kaiser 14/Mar/20 12:33   14/Mar/20 12:33   3.5.6   leaderElection   0 2   Short overview:

If you have a large snapCount (we are using 10,000,000) you can end up with a very large transaction log (ours are between 1GB - 1.5 GB), which can cause the sync between a newly elected leader and it's followers to take 20+ seconds.  This stems from the code (FileTxnIterator.getStorageSize()) in most cases returning 0 even if the transaction log is 1GB.

 

Long Explanation:

A few years ago we had some trouble with our zookeeper cluster having many shortish (100-500ms) pauses during our peak traffic times.  These ended up resulting from the master taking a snap shot.  To solve this we upped the snapCount to 10,000,000 so that we weren't taking snapshots nearly as often.  We also made changes to reduce the size of our snapshots (from around 2.5 GB to ~500 MB).

I don't remember what version of zookeeper we were using originally, but this was all working fine using 3.4.10, but we started to have problems when we upgraded to 3.5.6 around 3 months ago.  We have a fairly high transaction rate and thus end up hitting the zxid overflow about once a month, which will cause a leader election.  When we were on 3.4.10, this was fine because leader election and syncing would happen within 2-4 seconds, which was low enough for us to be able to basically ignore it.  However after we upgraded to 3.5.6 the pauses we saw took between 15 - 30 seconds which were unacceptable for us.

For now to solve this I set zookeeper.forceSnapshotSync=true (yes, I know the comments say this is only supposed to be used for testing), which causes syncing using snapshots (only 10-50 MB) instead of the transaction log (1-1.5 GB).

 

Technical details:

I tried taking a look at the code and I think I know why this happens.  From what I learned, it looks like when a follower needs to sync with a leader on the leader LearnerHandler.syncFollower() gets called.  It goes through a big if statement, but at one point it will call db.getProposalsFromTxnLog(peerLastZxid, sizeLimit).  That peerLastZxid could be some very old zxid if the follower hadn't taken a snapshot in a long time (i.e. has a large snapCount) and the sizeLimit will generally be 0.33 * snapshot size (in my case around 10 MB).

Inside of that getProposalsFromTxnLog it will create a TxnIterator and then call getStorageSize() on it.  The problem comes from the fact that this call to getStorageSize() will usually return with 0.  The reason that happens is because the FileTxnIterator class has a "current" log file that it is reading, this.logFile, and a list of files that it would still have to iterate through, this.storedFiles.  The getStroageSize() function though only looks at the storedFiles list, so if the iterator has one large transaction log as the "current" log file and nothing in the storedFiles list, then this method will return 0 even though there is a huge transaction log to sync.

One other side affect of this problem is that even bouncing a follower can cause long (5-10 second) pauses as the leader will hold a read lock on the transaction log while it syncs up with the follower.

While I know what the problem is I don't know what the best solution is.  I'm willing to work on the solution, but I would appreciate suggestions.  One possible solution would be to include the this.logFile in the getStorageSize() calculation, however this could cause the estimate to over estimate the amount of data that is in the iterator (possibly by a lot) and I don't know what the consequences of doing that is.  I'm not quite sure what is a good way to get an accurate estimate.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 days ago 0|z0cj8w:
ZooKeeper ZOOKEEPER-3756

Members failing to rejoin quorum

Improvement In Progress Major Unresolved Mate Szalay-Beko Dai Shi Dai Shi 11/Mar/20 16:43   19/Mar/20 15:57   3.5.6, 3.5.7 3.6.1, 3.5.8 leaderElection   0 4 0 5400   Not sure if this is the place to ask, please close if it's not.

I am seeing some behavior that I can't explain since upgrading to 3.5:

In a 5 member quorum, when server 3 is the leader and each server has this in their configuration: 
{code:java}
server.1=100.71.255.254:2888:3888:participant;2181
server.2=100.71.255.253:2888:3888:participant;2181
server.3=100.71.255.252:2888:3888:participant;2181
server.4=100.71.255.251:2888:3888:participant;2181
server.5=100.71.255.250:2888:3888:participant;2181{code}
If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in the logs:
{code:java}
2020-03-11 20:23:35,720 [myid:2] - INFO [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - LOOKING
2020-03-11 20:23:35,721 [myid:2] - INFO [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885] - New election. My id = 2, proposed zxid=0x1b8005f4bba
2020-03-11 20:23:35,733 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (3, 2)
2020-03-11 20:23:35,734 [myid:2] - INFO [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection request 100.126.116.201:36140
2020-03-11 20:23:35,735 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (4, 2)
2020-03-11 20:23:35,740 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (5, 2)
2020-03-11 20:23:35,740 [myid:2] - INFO [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection request 100.126.116.201:36142
2020-03-11 20:23:35,740 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-03-11 20:23:35,742 [myid:2] - WARN [SendWorker:3:QuorumCnxManager$SendWorker@1143] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
2020-03-11 20:23:35,744 [myid:2] - WARN [SendWorker:3:QuorumCnxManager$SendWorker@1153] - Send worker leaving thread id 3 my id = 2
2020-03-11 20:23:35,745 [myid:2] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - Interrupting SendWorker{code}
The only way I can seem to get them to rejoin the quorum is to restart the leader.

However, if I remove server 4 and 5 from the configuration of server 1 or 2 (so only servers 1, 2, and 3 remain in the configuration file), then they can rejoin the quorum fine. Is this expected and am I doing something wrong? Any help or explanation would be greatly appreciated. Thank you.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 9 9223372036854775807
4 hours ago 0|z0cfrc:
ZooKeeper ZOOKEEPER-3755

Use maven to create fatjar

Improvement In Progress Major Unresolved Sushant Mane Sushant Mane Sushant Mane 09/Mar/20 18:57   18/Mar/20 12:58   3.6.0, 3.7.0   build, contrib-fatjar   0 1 0 3000   Replace ant with maven for building fatjar. 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 days ago 0|z0ccnc:
ZooKeeper ZOOKEEPER-3754

ZooKeeper unusable in OSGi - missing headers

Bug Open Major Unresolved Unassigned Amichai Rothman Amichai Rothman 09/Mar/20 03:30   09/Mar/20 03:30   3.6.0   build   0 1   Trying to upgrade Aries RSA from ZooKeeper 3.4.14 to 3.6.0, I found that itests (pax exam) fail - only to discover that the new ZooKeeper release jar is now missing all OSGi headers.

ZooKeeper releases are generally few and far in between, which makes it disappointing to see such regression bugs - it means other projects must remain on ancient versions of ZK for a long time.

It would be great if the OSGi headers could be added back and a new minor release cut expeditiously, or alternatively a separate zookeeper-osgi bundle be distributed in addition to the standard one if there are issues with the standard one for some reason.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 week, 3 days ago 0|z0cbfk:
ZooKeeper ZOOKEEPER-3753

Documentation website is broken

Bug Resolved Major Duplicate Damien Diederen Vik Gamov Vik Gamov 09/Mar/20 01:01 19/Mar/20 11:49 09/Mar/20 03:47 09/Mar/20 03:47     documentation   0 1   Hi 

it looks like the documentation website isn't working properly. All links are 404.

 

[https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands|file:///Users/viktor/projects/confluent/ps-recommendations/src/docs/asciidoc/whitepapers/#linkCheck33]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 week, 3 days ago 0|z0cbdc:
ZooKeeper ZOOKEEPER-3752

document not found

Improvement Resolved Minor Duplicate Damien Diederen green makey green makey 08/Mar/20 23:57   09/Mar/20 03:47 09/Mar/20 03:43 3.5.7 3.5.7 documentation   0 1   I want to use the zookeeper document,

but I find following links

[http://zookeeper.apache.org/doc/r3.5.7/index.html]

[http://zookeeper.apache.org/doc/current/index.html]

and so on

are

Not Found

The requested URL was not found on this server.

 

I hope someone can help to solve.

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 week, 3 days ago 0|z0cbb4:
ZooKeeper ZOOKEEPER-3751

upgrade jackson-databind to 2.10 from 2.9

Task Resolved Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 08/Mar/20 20:46 19/Mar/20 11:52 14/Mar/20 12:47 14/Mar/20 12:47 3.6.0, 3.5.7, 3.7.0 3.7.0, 3.6.1 security   0 0 0 2400   Upgrade jackson-databind to 2.10 form 2.9. 2.10 is the current latest version. Also we've been seeing lots of vulnerability reports with 2.9 - perhaps this will help. 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 days ago 0|z0cb7k:
ZooKeeper ZOOKEEPER-3750

update jackson-databind to address CVE-2020-9547, CVE-2020-9548, CVE-2020-9546

Bug Open Blocker Unresolved Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 07/Mar/20 21:19   19/Mar/20 07:48   3.6.0, 3.5.7, 3.7.0 3.5.8 security   0 1 0 3600   owasp is flagging jackson-databind again due to CVE-2020-9547, CVE-2020-9548, CVE-2020-9546

We need to update to 2.9.10.4
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 hours ago 0|z0camo:
ZooKeeper ZOOKEEPER-3749

https://zookeeper.apache.org/documentation.html links all dead

Bug Open Minor Unresolved Unassigned Lisa Beal Lisa Beal 07/Mar/20 19:59   09/Mar/20 03:47   3.5.7 3.5.7 documentation   0 2 14400 10800 3600 25% Web site https://zookeeper.apache.org [https://zookeeper.apache.org/documentation.html] links all dead, rendering all Zookeeper documentation inaccessible. 25% 25% 3600 10800 14400 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 week, 3 days ago 0|z0cam8:
ZooKeeper ZOOKEEPER-3748

Resolve release requirements in download page

Bug Resolved Major Fixed Zili Chen Zili Chen Zili Chen 07/Mar/20 11:47   14/Mar/20 13:50 14/Mar/20 12:49   3.7.0     0 1 0 8400   there is no link to the KEYS file - this is essential for
verifying sigs.

Also the links to release artifacts must not use downloads.a.o, they must
use the mirror system.
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 days ago 0|z0cah4:
ZooKeeper ZOOKEEPER-3747

zookeeper.sessionRequireClientSASLAuth not working

Bug Resolved Major Cannot Reproduce Mate Szalay-Beko SledgeHammer SledgeHammer 06/Mar/20 13:24   09/Mar/20 13:19 09/Mar/20 12:36 3.6.0       0 2   Windows 10 Professional

JDK 11.0.6
I have updated my 3.6.0 zkServer.cmd to include the new "-Dzookeeper.sessionRequireClientSASLAuth=true" flag. However, I am still able to connect with anonymous Kafka clients. Am I missing something? 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 week, 3 days ago 0|z0c9i8:
ZooKeeper ZOOKEEPER-3746

Move the download page to downloads.apache.org

Bug Resolved Major Fixed Zili Chen Zili Chen Zili Chen 05/Mar/20 05:56   07/Mar/20 08:36 07/Mar/20 08:35         0 1 0 2400   100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 week, 5 days ago 0|z0c768:
ZooKeeper ZOOKEEPER-3745

Update copyright notices from 2019 to 2020

Bug Resolved Major Fixed Zili Chen Zili Chen Zili Chen 05/Mar/20 05:54   14/Mar/20 13:50 14/Mar/20 12:52   3.7.0, 3.6.1     0 1 0 2400   100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 days ago 0|z0c760:
ZooKeeper ZOOKEEPER-3744

Make build reproducible

Wish Open Major Unresolved Unassigned Erik Erik 28/Feb/20 20:26   28/Feb/20 20:28           0 1   Maven has added built-in support for creating reproducible builds. See [https://maven.apache.org/guides/mini/guide-reproducible-builds.html.] The Zookeeper build is very close to being reproducible as is and would only require minor changes to get all the way there:
# Add the property project.build.outputTimestamp to the top level project object model.
# Upgrade the maven-jar-plugin to 3.2.0.
# Enable notimestamp for the maven-javadoc-plugin.
# Figure out what to do with the classes Version, VersionInfo and VersionInfoMain.

I did the first three and attached is the resulting diffoscope log for .\zookeeper-assembly\target\apache-zookeeper-3.7.0-SNAPSHOT-bin.tar.gz.

For 1 and 4 I suggest using a Git commit timestamp.

For 3 I believe there could be a fix in the upcoming 3.2.0 version of the maven-javadoc-plugin that would make notimestamp unnecessary. See https://issues.apache.org/jira/browse/MJAVADOC-627.

 
 
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 weeks, 5 days ago 0|z0c0ug:
ZooKeeper ZOOKEEPER-3743

Poor error messages about the parsing of the jass.conf file

Improvement Open Major Unresolved Mate Szalay-Beko Alexandre Anouthcine Alexandre Anouthcine 27/Feb/20 11:38   28/Feb/20 11:03   3.5.6       0 2   Debian 10 I'm trying to set up a cluster of 3 Zookeeper nodes and I'm struggling to understand the error messages regarding jaas.conf file. 

I found 2 articles about the [Server-Server mutual authentication|[https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication]]

jaas.conf file are the same on 3 nodes and it seems to be working:
|{{QuorumServer {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  user_zkinternal="pa$$word";}}
{{};}}
 
{{QuorumLearner {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  username="zkinternal";}}
{{  password="pa$$word";}}
{{};}}|

Now I want to connect a Solr client to it. I found an article about [Client-Server mutual authentication|[https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication]]

It show an example:
|{{Server {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  user_super=}}{{"adminsecret"}}
{{  user_bob=}}{{"bobsecret"}}{{;}}
{{};}}|

The problem when I try to change my original jaas.conf file with something else like:
|{{QuorumServer {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  user_zkinternal="pa$$word";}}
{{  user_solr=}}{{"solrsecret";}}
{{};}}
 
{{QuorumLearner {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  username="zkinternal"}}
{{  password="pa$$word";}}
{{};}}|

or 
|{{QuorumServer {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  user_zkinternal="pa$$word";}}
{{};}}
 
{{QuorumLearner {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  username="zkinternal"}}
{{  password="pa$$word";}}
{{};}}
 
{{Server {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
    {{user_solr=}}{{"solrsecret"}}{{;}}
{{};}}|

or even
|{{QuorumServer {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  user_zkinternal="pa$$word"}}
{{};}}
 
{{QuorumLearner {}}
{{  org.apache.zookeeper.server.auth.DigestLoginModule required}}
{{  username="zkinternal"}}
{{  password="pa$$word";}}
{{};}}|

(Notice the missing semicolon for the user)

I always get the same error message which doesn't make much sense to me:
|2020-02-27 16:24:07,815 [myid:] - INFO  [main:QuorumPeerConfig@133] - Reading configuration from: /conf/zoo.cfg
2020-02-27 16:24:07,815 [myid:] - INFO  [main:QuorumPeerConfig@133] - Reading configuration from: /conf/zoo.cfg
2020-02-27 16:24:07,822 [myid:] - INFO  [main:QuorumPeerConfig@385] - clientPortAddress is 0.0.0.0/0.0.0.0:2181
2020-02-27 16:24:07,822 [myid:] - INFO  [main:QuorumPeerConfig@389] - secureClientPort is not set
2020-02-27 16:24:08,676 [myid:3] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2020-02-27 16:24:08,678 [myid:3] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2020-02-27 16:24:08,679 [myid:3] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2020-02-27 16:24:08,680 [myid:3] - INFO  [main:ManagedUtil@46] - Log4j found with jmx enabled.
2020-02-27 16:24:08,690 [myid:3] - INFO  [main:QuorumPeerMain@141] - Starting quorum peer
2020-02-27 16:24:08,697 [myid:3] - INFO  [main:ServerCnxnFactory@135] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2020-02-27 16:24:08,712 [myid:3] - ERROR [main:ServerCnxnFactory@231] - No JAAS configuration section named 'Server' was found in '/conf/jaas.conf'.
2020-02-27 16:24:08,730 [myid:3] - ERROR [main:QuorumPeerMain@101] - Unexpected exception, exiting abnormally
java.io.IOException: No JAAS configuration section named 'Server' was found in '/conf/jaas.conf'.
     at org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:232)
     at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:646)
     at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:148)
     at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
     at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)|

So what does it mean ? How to do it ?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 6 days ago 0|z0byvk:
ZooKeeper ZOOKEEPER-3742

How to using rsyslog in zookeeper and modifying the port does not apply to the default port 514?

Improvement Open Major Unresolved Unassigned yaojingen yaojingen 27/Feb/20 09:39   27/Feb/20 09:39   3.5.6   other   0 1   I want to change log4j udp port to dock rsyslog.

If set following configuration parameters in conf/log4j.properties file

log4j.appender.SYSLOG.host=localhost
log4j.appender.SYSLOG.port=515
log4j.appender.SYSLOG.protocol=UDP

It will report the following error in zookeeper log file.

scl:
log4j:WARN No such property [port] in org.apache.log4j.net.SyslogAppender.
log4j:WARN No such property [protocol] in org.apache.log4j.net.SyslogAppender.
log4j:WARN No such property [host] in org.apache.log4j.net.SyslogAppender.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks ago 0|z0byq8:
ZooKeeper ZOOKEEPER-3741

Fix ZooKeeper 3.5 C client build on Fedora8

Improvement Resolved Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 27/Feb/20 07:31   28/Feb/20 03:49 27/Feb/20 08:23 3.6.0 3.7.0, 3.6.1 c client   0 2 0 2400   Using new RHEL / Centos8 docker images, it was not possible to build the ZooKeeper C client for 3.5.5 and 3.5.6. The compilation error was fixed by [~ztzg] in ZOOKEEPER-3719 for branch 3.5 and 3.5.7. But one of the errors is still present on the master branch.

We had a warning that we are trying to call like {{sprintf(buf,"%s:%d",addrstr,port);}}, and both {{buf}} and {{addrstr}} are 128 long char arrays. So in theory, we can overflow. The fix is to increase the length of the destination string array ({{buf}}).

Actually this problem only causing compile time warning / failure only on the 3.5.5 and 3.5.6. On 3.5.7 this was fixed and on 3.6+ branches the compiler can not detect the problem due to some code refactoring made by ZOOKEEPER-3068, but the issue is still present on the master branch.

100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 6 days ago 0|z0byhs:
ZooKeeper ZOOKEEPER-3740

Fix PurgeTxnTest.testPurgeWhenLogRollingInProgress

Improvement Open Major Unresolved Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 27/Feb/20 03:44   05/Mar/20 08:34           0 2 0 1200   I am not sure if it is a test problem or a real problem, but this test is always fails for me locally when I execute a full "{{mvn clean test"}} to test the release candidates.

It also fails frequently on jenkins, e.g.: 
[https://builds.apache.org/view/ZK%20All/job/zookeeper-master-maven/681/]


{code:java}
2020-02-24 06:14:55,914 [myid:] - WARN [main:JUnit4ZKTestRunner$LoggedInvokeMethod@105] - TEST METHOD FAILED testPurgeWhenLogRollingInProgress
java.lang.AssertionError: ZkClient ops is not finished!
at org.apache.zookeeper.server.PurgeTxnTest.manyClientOps(PurgeTxnTest.java:590)
at org.apache.zookeeper.server.PurgeTxnTest.testPurgeWhenLogRollingInProgress(PurgeTxnTest.java:154)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:86)
2020-02-24 06:14:57,824 [myid:] - ERROR [main:ZKTestCase$1@101] - FAILED testPurgeWhenLogRollingInProgress
java.lang.AssertionError: ZkClient ops is not finished!
at org.apache.zookeeper.server.PurgeTxnTest.manyClientOps(PurgeTxnTest.java:590)
at org.apache.zookeeper.server.PurgeTxnTest.testPurgeWhenLogRollingInProgress(PurgeTxnTest.java:154)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:86)
2020-02-24 06:14:57,824 [myid:] - INFO [main:ZKTestCase$1@91] - FINISHED testPurgeWhenLogRollingInProgress
{code}
 
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks ago 0|z0by34:
ZooKeeper ZOOKEEPER-3739

Remove use of com.sun.nio.file.SensitivityWatchEventModifier

Bug Patch Available Major Unresolved Christopher Tubbs Christopher Tubbs Christopher Tubbs 25/Feb/20 13:51 19/Mar/20 11:48 19/Mar/20 15:02     3.7.0, 3.6.1 build, server   0 2 0 6000   For better support building on newer JDKs, the unsupported class, com.sun.nio.file.SensitivityWatchEventModifier, must not be used.

I will submit a PR for this.
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 2 days ago 0|z0bvn4:
ZooKeeper ZOOKEEPER-3738

Avoid use of broken codehaus properties-maven-plugin

Bug Resolved Major Fixed Christopher Tubbs Christopher Tubbs Christopher Tubbs 25/Feb/20 12:38   02/Mar/20 12:43 02/Mar/20 12:43   3.7.0, 3.6.1 build   0 2 0 5400   properties-maven-plugin uses an older version of plexus utils, that fails to read ENV variables properly when a variable is multi-line.

This bug causes it to be difficult to build ZooKeeper in some environments (Fedora, with a default bash 4 shell, for example).

Since ZooKeeper only uses this plugin to get the git commit id, the plugin can be removed, and replaced with a more specific plugin that can achieve the same job with simpler config (https://github.com/koraktor/mavanagaiata)

I'm working on a PR for this, which will come shortly.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 2 days ago 0|z0bvk0:
ZooKeeper ZOOKEEPER-3737

Unable to eliminate log4j1 transitive dependency

Bug Resolved Major Fixed Christopher Tubbs Christopher Tubbs Christopher Tubbs 24/Feb/20 20:20   01/Mar/20 01:56 01/Mar/20 01:56 3.4.14, 3.5.7 3.7.0, 3.6.1, 3.5.8 jmx, server   0 4 0 4800   Apache Accumulo is trying to switch to using log4j2 only. However, this seems impossible, because ZooKeeper has a hard-coded dependency on log4j 1.2 for some sort of jmx thing. The following is the error and stack trace I get whenever I remove log4j 1.2 from the class path and try to run a test instance of ZooKeeper as part of Accumulo's build test suite.

{code}
2020-02-24T20:10:03,682 [jmx.ManagedUtil] ERROR: Problems while registering log4j jmx beans!
java.lang.ClassNotFoundException: org.apache.log4j.jmx.HierarchyDynamicMBean
at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:521) ~[?:?]
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:315) ~[?:?]
at org.apache.zookeeper.jmx.ManagedUtil.registerLog4jMBeans(ManagedUtil.java:72) ~[zookeeper-3.5.7.jar:3.5.7]
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:94) ~[zookeeper-3.5.7.jar:3.5.7]
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64) ~[zookeeper-3.5.7.jar:3.5.7]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at org.apache.accumulo.start.Main.lambda$execMainClass$1(Main.java:167) ~[accumulo-start-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at java.lang.Thread.run(Thread.java:834) [?:?]
{code}

I know previous work has been done on ZOOKEEPER-850 and ZOOKEEPER-1371 to eliminate the use of log4j in the source, but this work does not appear to be complete, since it is still required at runtime (at least, for the server... but maybe for the client too... it's hard to tell from the way Accumulo runs its test suite, and I'm not super familiar with ZK internals).
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 4 days ago
Reviewed
0|z0buhs:
ZooKeeper ZOOKEEPER-3736

Zookeeper auto purge process does not purge files

Bug Open Major Unresolved Unassigned Shubham S Shubham S 24/Feb/20 17:15   25/Feb/20 11:37   3.4.14       0 3   Docker version 3.5.

Mac OS.
Hi, I am building a zookeeper docker image from official docker images. I am currently using Zookeeper 3.4.14. I have 3 containers, each running a zookeeper server and I am setting the environment variable as shown in the docker-compose.yml file 

environment:

  ZOO_AUTOPURGE_PURGEINTERVAL: 24
  ZOO_AUTOPURGE_SNAPRETAINCOUNT: 3

I can clearly see the values reflecting back in my zoo.cfg.

 
{code:java}
cat
/conf/zoo.cfg
clientPort=xxxx
dataDir=/data
dataLogDir=/datalog
tickTime=2000
initLimit=5
syncLimit=2
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
maxClientCnxns=60
server.1=zoo1:xxxx:xxxx server.2=zoo2:xxxx:xxxx server.3=0.0.0.0:xxxx:xxxx
{code}
 

Getting in the container and doing a printenv, I can see the values reflecting back as well.

 
{code:java}
printenv
ZOO_DATA_LOG_DIR=/datalog
HOSTNAME=0536b195f621
JAVA_HOME=/usr/local/openjdk-8
ZOO_DATA_DIR=/data
JAVA_BASE_URL=https://github.com/AdoptOpenJDK/openjdk8-upstream-binaries/releases/download/jdk8u232-b09/OpenJDK8U-jre_
ZOO_INIT_LIMIT=5
PWD=/datalog/version-2
JAVA_URL_VERSION=8u232b09
ZOO_AUTOPURGE_SNAPRETAINCOUNT=3
HOME=/root
LANG=C.UTF-8
ZOO_SYNC_LIMIT=2
ZOO_SERVERS=server.1=zoo1:xxxx:xxxx server.2=zoo2:xxxx:xxxx server.3=0.0.0.0:xxxx:xxxx
SHLVL=1
ZOO_MY_ID=3
ZOO_MAX_CLIENT_CNXNS=60
ZOO_TICK_TIME=2000
ZOO_CONF_DIR=/conf PATH=/usr/local/openjdk-8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/zookeeper-3.4.14/bin
ZOOCFGDIR=/conf
ZOO_AUTOPURGE_PURGEINTERVAL=24
JAVA_VERSION=8u232
ZOO_LOG_DIR=/logs
OLDPWD=/zookeeper-3.4.14 _=/usr/bin/printenv
{code}
 

 

I can also clearly see the purge task being completed as well

 
{code:java}
020-02-18T16:49:56.605549689Z 2020-02-18 16:49:56,604 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2020-02-18T16:49:56.636000804Z 2020-02-18 16:49:56,635 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. 2020-02-19T16:49:56.606280261Z 2020-02-19 16:49:56,605 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2020-02-19T16:49:56.657389039Z 2020-02-19 16:49:56,657 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. 2020-02-20T16:49:56.605362615Z 2020-02-20 16:49:56,604 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2020-02-20T16:49:56.612265088Z 2020-02-20 16:49:56,611 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. 2020-02-21T16:49:56.605773207Z 2020-02-21 16:49:56,604 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2020-02-21T16:49:56.643037255Z 2020-02-21 16:49:56,642 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. 2020-02-22T16:49:56.605712054Z 2020-02-22 16:49:56,605 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2020-02-22T16:49:56.661826480Z 2020-02-22 16:49:56,661 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. 2020-02-23T16:49:56.606569211Z 2020-02-23 16:49:56,604 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2020-02-23T16:49:56.629269327Z 2020-02-23 16:49:56,628 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. 2020-02-24T16:49:56.605299157Z 2020-02-24 16:49:56,604 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2020-02-24T16:49:56.606483941Z 2020-02-24 16:49:56,606 [myid:3] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
{code}
 

But neither the logs nor the snapshots are being deleted.

I have redeployed the entire stack and even build a new image from the official docker images but I still am getting the same result.

 

Using the following command does work correctly but I don't want to do it manually.
./zkCleanup.sh -n 3

 

Can someone help me out? 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 2 days ago 0|z0buao:
ZooKeeper ZOOKEEPER-3735

fix the bad format of RATE_LOGGER

Bug Open Minor Unresolved Unassigned maoling maoling 23/Feb/20 20:35   05/Mar/20 04:48       server   0 2   {code:java}
} else if (digestFromLoadedSnapshot.zxid != 0 && zxid > digestFromLoadedSnapshot.zxid) {
RATE_LOGGER.rateLimitLog("The txn 0x{} of snapshot digest does not "
+ "exist.", Long.toHexString(digestFromLoadedSnapshot.zxid));
}
{code}
the printed log likes this:
{code:java}
Message:The txn 0x{} of snapshot digest does not exist. Value:fa4e00000082
{code}
*1. 0x{}* takes no effort

*2. RATE_LOGGER.rateLimitLog* doesn't use like the ordinary LOG
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks ago 0|z0bt1s:
ZooKeeper ZOOKEEPER-3734

upgrade jackson-databind to address CVE-2020-8840

Task Resolved Blocker Fixed Enrico Olivelli Patrick D. Hunt Patrick D. Hunt 22/Feb/20 21:59   25/Feb/20 10:10 23/Feb/20 11:48 3.6.0, 3.5.7, 3.7.0 3.6.0, 3.7.0, 3.5.8 security   0 0 0 1800   owasp check is failing with

[ERROR] jackson-databind-2.9.10.2.jar: CVE-2020-8840

looks like we need to upgrade to 2.9.10.3 or later.
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 4 days ago 0|z0bsj4:
ZooKeeper ZOOKEEPER-3733

Fix issues reported in 3.6.0rc3

Task Resolved Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 21/Feb/20 10:54   24/Feb/20 01:05 24/Feb/20 01:05   3.6.0, 3.7.0     0 1 0 4200   - metrics library LICENSE file has wrong file name
- spotbugs is not passing because the Info.java interface sets a null value for "qualifier"
- the name of the directory inside the source tarbal is not consistent with the file name and with 3.5 tradition
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 3 days ago
Reviewed
0|z0br94:
ZooKeeper ZOOKEEPER-3732

Update jUnit to 5.6

Improvement In Progress Major Unresolved Tamas Penzes Tamas Penzes Tamas Penzes 18/Feb/20 10:41   24/Feb/20 10:27   3.6.0       0 1 0 600   jUnit 4 is limited to Java 7 features, jUnit 5 can use new elements of JDK8.

It's time to go forward and update our jUnit version to 5.6.
100% 100% 600 0 junit5, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 2 days ago 0|z0bmb4:
ZooKeeper ZOOKEEPER-3731

Disable HTTP TRACE Method

Improvement Open Critical Unresolved Unassigned Aaron Aaron 18/Feb/20 07:54   18/Feb/20 07:54   3.5.7       0 2   ZooKeeper uses embedded jetty which allows TRACE method by default. This is a widely-known security concern. Please disable HTTP TRACE method.

 

CVE-2004-2320, CVE-2010-0386, CVE-2003-1567 for more info.

 

Example:
{quote}{{$ curl -vX TRACE 10.32.99.185:8080}}
{{* Rebuilt URL to: 10.32.99.185:8080/}}
{{* Trying 10.32.99.185...}}
{{* TCP_NODELAY set}}
{{* Connected to 10.32.99.185 (10.32.99.185) port 8080 (#0)}}
{{> TRACE / HTTP/1.1}}
{{> Host: 10.32.99.185:8080}}
{{> User-Agent: curl/7.59.0}}
{{> Accept: */*}}
{{>}}
{{< HTTP/1.1 200 OK}}
{{< Date: Tue, 18 Feb 2020 12:38:35 GMT}}
{{< Content-Type: message/http}}
{{< Content-Length: 81}}
{{< Server: Jetty(9.4.17.v20190418)}}
{{<}}
{{TRACE / HTTP/1.1}}
{{User-Agent: curl/7.59.0}}
{{Accept: */*}}
{{Host: 10.32.99.185:8080}}
{{* Connection #0 to host 10.32.99.185 left intact}}{quote}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 2 days ago 0|z0blzk:
ZooKeeper ZOOKEEPER-3730

fix a typo about watchManagerName in the zookeeperAdmin.md

Improvement Open Trivial Unresolved Unassigned maoling maoling 17/Feb/20 21:20   02/Mar/20 08:12       documentation   0 2 0 3600   {code:java}
* *watchManaggerName* :
(Java system property only: **zookeeper.watchManagerName**)
**New in 3.6.0:** Added in [ZOOKEEPER-1179](https://issues.apache.org/jira/browse/ZOOKEEPER-1179)
New watcher manager WatchManagerOptimized is added to optimize the memory overhead in heavy watch use cases. This
config is used to define which watcher manager to be used. Currently, we only support WatchManager and
WatchManagerOptimized.
{code}
*watchManaggerName* is a typo
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 4 days ago 0|z0blf4:
ZooKeeper ZOOKEEPER-3729

fix a typo about watchManagerName in the zookeeperAdmin.md

Improvement Resolved Trivial Duplicate Unassigned maoling maoling 17/Feb/20 21:17   17/Feb/20 21:22 17/Feb/20 21:22     documentation   0 1   {code:java}
* *watchManaggerName* :
(Java system property only: **zookeeper.watchManagerName**)
**New in 3.6.0:** Added in [ZOOKEEPER-1179](https://issues.apache.org/jira/browse/ZOOKEEPER-1179)
New watcher manager WatchManagerOptimized is added to optimize the memory overhead in heavy watch use cases. This
config is used to define which watcher manager to be used. Currently, we only support WatchManager and
WatchManagerOptimized.
{code}
*watchManaggerName* is a typo
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 2 days ago 0|z0blew:
ZooKeeper ZOOKEEPER-3728

move traceMask calculation logic into the trace log in the FinalRequestProcessor#processRequest

Improvement Patch Available Minor Unresolved Brittany Barnes maoling maoling 15/Feb/20 07:21   18/Mar/20 14:05       server   0 2 0 600   {code:java}
LOG.debug("Processing request:: {}", request);

// request.addRQRec(">final");
long traceMask = ZooTrace.CLIENT_REQUEST_TRACE_MASK;
if (request.type == OpCode.ping) {
traceMask = ZooTrace.SERVER_PING_TRACE_MASK;
}
if (LOG.isTraceEnabled()) {
ZooTrace.logRequest(LOG, traceMask, 'E', request, "");
}
{code}
# remove the useless *// request.addRQRec(">final");*
#  most read/write requests will hit the code here but useless when Log Trace disable. we need to move traceMask calculation logic into the LOG.isTraceEnabled()
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 1 day ago
Reviewed
0|z0bj4w:
ZooKeeper ZOOKEEPER-3727

Fix 3.5 source tarball to represent the git repository

Improvement Open Major Unresolved Unassigned Norbert Kalmár Norbert Kalmár 14/Feb/20 06:59   14/Feb/20 07:00     3.5.8     0 1   There are some difference in the source tarball for recent 3.5 releases, like:
- Info.java file in zookeeper-server (generated class)
- checkstyle.xml is missing
- git.properties file present in tarball
- missing .gitattributes and .gitignore (the question is should these be even included?)
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 6 days ago 0|z0bhxc:
ZooKeeper ZOOKEEPER-3726

invalid ipv6 address comparison

Bug Open Major Unresolved Unassigned Vladislav Tyulbashev Vladislav Tyulbashev 13/Feb/20 11:45   05/Mar/20 16:28   3.5.6   c client   0 1 0 5400   Zookeeper C Client periodically resolves server names since https://issues.apache.org/jira/browse/ZOOKEEPER-1355

After dns resolution ip addresses are checked against previous set of addresses. However, currently only several bytes are checked (it is assumed, that all addresses are ipv4 length bytes).

Case:
1) zookeeper server operates only by ipv6
2) client connects to it by some hostname (zookeeper-1.news.yandex.ru, for example)
3) container with zookeeper server dies, new container is up, and zookeeper-1.news.yandex.ru now points to new address
4) several bits in ipv6 address are changed
5) zookeeper client ignores changes in address, because of incorrect strcmp size and first bytes were equal
6) zookeeper client now can't reconnect to zookeeper without manual intervention, because it tries old address

This is proposed fix: [https://github.com/apache/zookeeper/pull/1252]
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 weeks ago 0|z0bgsw:
ZooKeeper ZOOKEEPER-3725

Zookeeper fails to establish quorum with 2 servers using 3.5.6

Bug Resolved Major Duplicate Mate Szalay-Beko Antoine DESSAIGNE Antoine DESSAIGNE 13/Feb/20 03:36   09/Mar/20 08:05 09/Mar/20 08:05 3.5.6       0 5   Hello everyone,

We noticed that with Zookeeper 3.5.6, it fails to establish quorum on a new deployment on a regular basis (approx 50% of the time)

We were able to reduce the reproduction steps to the bare minimum we could. Consider the following docker-compose.yml file
{noformat}
version: '2'
services:
orchestrator1.cameltest.int:
image: zookeeper:3.5.6
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=orchestrator2.cameltest.int:2888:3888
orchestrator2.cameltest.int:
image: zookeeper:3.5.6
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=orchestrator1.cameltest.int:2888:3888 server.2=0.0.0.0:2888:3888
{noformat}
When launching a brand new cluster with it (with {{docker-compose up}}, no previous data) it fails half of the time with 3.5.6 and never in 3.4.14.

You'll find attached 3 logs:
* a failure one using 3.5.6
* a success one using 3.5.6
* a success one 3.4.14

I don't think it's related to some docker/docker-compose issue (as it's working using 3.4.14 on the same server)

I'll try to check each intermediate release to pin a more specific version.

Unfortunately, I don't know yet my way in the Zookeeper code, what can I do to help? Thanks!

PS: Yes, it's strange to have 2 servers as they're both required to work, but it's the smallest repro-case
9223372036854775807 No Perforce job exists for this issue. 3 9223372036854775807
1 week, 3 days ago 0|z0bg54:
ZooKeeper ZOOKEEPER-3724

[Java Client] - Calculation of connectionTimeout needs improvement.

Bug Open Major Unresolved Unassigned Deepak Vilakkat Deepak Vilakkat 13/Feb/20 01:01   13/Feb/20 01:01       java client   0 1   [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L439]

This makes scaling zookeeper an issue without notifying all clients that they need to increase the sessionTimeout to a large value. Already had a production outage when a client in an asia data-center was trying to write to a zookeeper server in america for cross-colo announcements. The session timeout was kept at 5000ms and was working all the while but the cluster size was increased which made this value less than 200ms. Since its technically impossible to connect with this value, we increased session timeout.

 

Shouldn't there be a floor value like 5 seconds, below which this value shouldn't drop. Theoretically this calculation can make connecting over Local network also timeout in some use cases.

 

This was also discussed in [http://zookeeper-user.578899.n2.nabble.com/How-to-modify-Client-Connection-timer-td7583017.html#a7583019] and I am trying to understand if there is some other catch for this implementation.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 weeks ago 0|z0bfw0:
ZooKeeper ZOOKEEPER-3723

Zookeeper Client should not fail with ZSYSTEMERROR if DNS does not resolve one of the servers in the zk ensemble.

Improvement Open Minor Unresolved Unassigned Suhas Dantkale Suhas Dantkale 12/Feb/20 11:58   25/Feb/20 01:23   3.5.5   c client, java client   1 2 0 4200   This is a minor enhancement request to not fail the session initiation if the DNS is not able to resolve the hostname of one of the servers in the Zookeeper ensemble.

 

The Zookeeper client resolves all the hostnames in the ensemble while establishing the session.

In Kubernetes environment with coreDNS, the hostname entry gets removed from coreDNS during the POD restarts. Though we can manipulate the coreDNS settings to delay the removal of the hostname entry from DNS, we don't want to leave any race where Zookeeper clinet is trying to establish a session and it fails because the DNS temporarily is not able to resolve the hostname. So as long as one of the servers in the ensemble is able to be DNS resolvable, should we not fail the session establishment with hard error and instead try to establish the connection with one of the other servers?

 

Look at the below snippet where  resolve_hosts() fails with ZSYSTEMERROR.
{code:java}
if ((rc = getaddrinfo(host, port_spec, &hints, &res0)) != 0) {
            //bug in getaddrinfo implementation when it returns
            //EAI_BADFLAGS or EAI_ADDRFAMILY with AF_UNSPEC and
            // ai_flags as AI_ADDRCONFIG
#ifdef AI_ADDRCONFIG
            if ((hints.ai_flags == AI_ADDRCONFIG) &&
// ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
#ifdef EAI_ADDRFAMILY
                ((rc ==EAI_BADFLAGS) || (rc == EAI_ADDRFAMILY))) {
#else
                (rc == EAI_BADFLAGS)) {
#endif
                //reset ai_flags to null
                hints.ai_flags = 0;
                //retry getaddrinfo
                rc = getaddrinfo(host, port_spec, &hints, &res0);
            }
#endif
            if (rc != 0) {
                errno = getaddrinfo_errno(rc);
#ifdef _WIN32
                LOG_ERROR(LOGCALLBACK(zh), "Win32 message: %s\n", gai_strerror(rc));
#elif __linux__ && __GNUC__
                LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", gai_strerror(rc));
#else
                LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", strerror(errno));
#endif
                rc=ZSYSTEMERROR;
                goto fail;
            }
        }
{code}
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 6 days ago 0|z0bf7s:
ZooKeeper ZOOKEEPER-3722

make logs of ResponseCache more readable

Improvement Open Minor Unresolved Nishanth Entoor maoling maoling 12/Feb/20 03:32   17/Feb/20 16:08       server   0 2 0 600   The logs look like redundant:
{code:java}
2020-02-12 16:16:09,208 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2183)(secure=disabled):ResponseCache@45] - Response cache size is initialized with value 400.
2020-02-12 16:16:09,208 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2183)(secure=disabled):ResponseCache@45] - Response cache size is initialized with value 400.{code}
What we want is:
{code:java}
2020-02-12 16:16:09,208 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2183)(secure=disabled):ResponseCache@45] - getData Response cache size is initialized with value 400.
2020-02-12 16:16:09,208 [myid:3] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2183)(secure=disabled):ResponseCache@45] - getChild Response cache size is initialized with value 400.
{code}
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 6 days ago 0|z0befk:
ZooKeeper ZOOKEEPER-3721

Making the boolean configuration parameters consistent

Improvement Patch Available Major Unresolved Unassigned Ctest Ctest 11/Feb/20 16:45   28/Feb/20 15:31   3.5.6   server   0 2 0 7200   *Description*

The QuorumPeerConfig.java uses java built-in method
{code:java}
Boolean.parseBoolean(String value){code}
to parse almost all boolean parameters. When the value is "true" (ignoring case), this method will return true. Otherwise, it will return false. It means all these boolean parameters can accept any string and translate it into false as long as it is not "true". 

standaloneEnabled and reconfigEnabled are two exceptions because they only accept "true" or "false":
{code:java}
} else if (key.equals("standaloneEnabled")) {
if (value.toLowerCase().equals("true")) {
setStandaloneEnabled(true);
} else if (value.toLowerCase().equals("false")) {
setStandaloneEnabled(false);
} else {
throw new ConfigException("Invalid option "
+ value
+ " for standalone mode. Choose 'true' or 'false.'");
}{code}
 

*Improvement*

To improve this part, I am trying to unify all these boolean parser methods and make them more robust. Generally, I wrote a parseBoolean which only accepts "true" or "false" in QuorumPeerConfig.java and use this method for parsing all boolean parameters.
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
4 weeks, 1 day ago 0|z0be00:
ZooKeeper ZOOKEEPER-3720

Rolling upgrade failure due to invalid protocol version

Improvement Resolved Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 11/Feb/20 05:08   16/Feb/20 04:25 16/Feb/20 04:25 3.6.0, 3.7.0 3.6.0     0 5 0 15000   During the rolling upgrade of a 3.5.6 cluster to 3.6.0, the peers are not able to talk to each other due to different protocol version in the QuorumCnxManager introduced by ZOOKEEPER-3188.

 

We need to fix the rolling upgrade between 3.5.6 and 3.6.0 in a way that the quorum is always up an healthy.
100% 100% 15000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 4 days ago 0|z0bd60:
ZooKeeper ZOOKEEPER-3719

C Client compilation issues in 3.5.7-rc

Bug Closed Major Fixed Damien Diederen Damien Diederen Damien Diederen 07/Feb/20 10:45 19/Mar/20 11:50 27/Feb/20 08:13 09/Feb/20 08:55 3.5.6 3.5.7 c client   0 1 0 4800   The C client included in {{release-3.5.7-rc0}} and {{release-3.5.7-rc1}} suffers from a few issues:

# It configures, but "forgets" to build the C code in the {{full-build}} profile;
# Compilation actually fails with GCC 8.3, as the the {{Makefile}} uses {{-Werror}} and the compiler detects a couple possible buffer overruns;
# The {{WIN32}} branch of the code does not compile because of a change in socket representation.

This should probably be set to "blocker," but I don't know if the C client is supposed to block a release. Oh, and the first issue, at least, also existed in 3.5.6—and it seems nobody complained :)

A "pull request" is in the works.
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 weeks, 6 days ago 0|z0b9nc:
ZooKeeper ZOOKEEPER-3718

Generated source tarball is missing some files

Bug Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 07/Feb/20 08:28   14/Feb/20 10:23 10/Feb/20 02:14 3.5.6 3.5.7     0 1   As [~eolivelli] pointed out:

{quote}
I see differences between the contents of the source tarball and the
git tag (using Meld, as suggested by Patrick some month ago), namely:
- there is not checkstyleSuppressions.xml file, and mvn
checkstyle:check fails (it is not bound to the default lifecycle, so
mvn clean install still works)
- there are ".c" generated files, they should not be part of the source release
- there is not "dev" directory
- there is not .travis.yml file
{quote}

Note: only affects branch-3.5.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 weeks, 3 days ago 0|z0b9go:
ZooKeeper ZOOKEEPER-3717

Zookeeper won't work with swarm service VIP when enabling healthcheck

Bug Open Major Unresolved Unassigned Baohua Baohua 04/Feb/20 13:30   04/Feb/20 15:21   3.5.1, 3.5.2, 3.5.3, 3.5.4, 3.5.5, 3.4.14, 3.5.6       0 2   *Issue*:

When use swarm service to create a zookeeper cluster with service VIP and enable healthcheck in the zookeeper docker image (even directly return 0), the client cannot connect to the server (status is connecting, and then retry), but the server status will report everything is OK.

*Root cause*:

When swarm service is started, the service VIP does not work until the healthcheck is passed. At that time, seems the client service is in an abnormal status that even after healthcheck is passed, and the zookeeper cluster is normal, no client can connect to the zookeeper.

*Solution (potential)*:

After the cluster status becomes OK, need to reset the client service to allow connection.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 2 days ago 0|z0b5jc:
ZooKeeper ZOOKEEPER-3716

upgrade netty 4.1.42 to address CVE-2019-20444 CVE-2019-20445

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 03/Feb/20 14:05   14/Feb/20 10:23 04/Feb/20 04:35 3.6.0, 3.5.6, 3.7.0 3.6.0, 3.5.7 security, server   0 2 0 3000   OWASP dependency-check is failing

upgrade netty 4.1.42 to address CVE-2019-20444 CVE-2019-20445

[ERROR] netty-transport-4.1.42.Final.jar: CVE-2019-20445, CVE-2019-20444

We need to upgrade to netty 4.1.45 (current latest) or later
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 weeks, 4 days ago 0|z0b468:
ZooKeeper ZOOKEEPER-3715

Kerberos Authentication related tests fail for new JDK versions

Improvement Closed Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 03/Feb/20 08:00   14/Feb/20 10:23 04/Feb/20 04:39   3.6.0, 3.5.7     0 2 0 5400   using OpenJDK 1.8.242 or OpenJDK 11.0.6, I got some kerberos related exceptions when running the following, Kerberos Authentication related tests:
- QuorumKerberosAuthTest
- QuorumKerberosHostBasedAuthTest
- SaslKerberosAuthOverSSLTest
 
the error:
{code:bash}
2020-02-03 12:11:07,197 [myid:localhost:11223] - ERROR [main-SendThread(localhost:11223):ZooKeeperSaslClient@336] - An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: null (5001))]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
 {code}
more detailed stack trace:
{code:bash}
Found ticket for zkclient/localhost@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Tue Feb 04 13:49:14 CET 2020Found ticket for zkclient/localhost@EXAMPLE.COM to go to krbtgt/EXAMPLE.COM@EXAMPLE.COM expiring on Tue Feb 04 13:49:14 CET 2020Entered Krb5Context.initSecContext with state=STATE_NEWService ticket not found in the subject>>> Credentials serviceCredsSingle: same realmUsing builtin default etypes for default_tgs_enctypesdefault etypes for default_tgs_enctypes: 18 17 16 23.>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType>>> CksumType: sun.security.krb5.internal.crypto.HmacSha1Aes128CksumType>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType>>> KrbKdcReq send: kdc=localhost TCP:62653, timeout=30000, number of retries =3, #bytes=586>>> KDCCommunication: kdc=localhost TCP:62653, timeout=30000,Attempt =1, #bytes=586>>>DEBUG: TCPClient reading 112 bytes>>> KrbKdcReq send: #bytes read=112>>> KdcAccessibility: remove localhost:62653>>> KDCRep: init() encoding tag is 126 req type is 13>>>KRBError: sTime is Mon Feb 03 13:49:14 CET 2020 1580734154000 suSec is 100 error code is 5001 error Message is null crealm is EXAMPLE.COM sname is zkquorum/localhost@EXAMPLE.COM msgType is 30>>> Credentials serviceCredsSingle: same realmUsing builtin default etypes for default_tgs_enctypesdefault etypes for default_tgs_enctypes: 18 17 16 23.>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType>>> CksumType: sun.security.krb5.internal.crypto.HmacSha1Aes128CksumType>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType>>> KrbKdcReq send: kdc=localhost TCP:62653, timeout=30000, number of retries =3, #bytes=586>>> KDCCommunication: kdc=localhost TCP:62653, timeout=30000,Attempt =1, #bytes=586>>>DEBUG: TCPClient reading 112 bytes>>> KrbKdcReq send: #bytes read=112>>> KdcAccessibility: remove localhost:62653>>> KDCRep: init() encoding tag is 126 req type is 13>>>KRBError: sTime is Mon Feb 03 13:49:14 CET 2020 1580734154000 suSec is 100 error code is 5001 error Message is null crealm is EXAMPLE.COM sname is zkquorum/localhost@EXAMPLE.COM msgType is 30KrbException: null (5001) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:70) at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:226) at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:237) at sun.security.krb5.internal.CredentialsUtil.serviceCredsSingle(CredentialsUtil.java:400) at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:287) at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:263) at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:118) at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:490) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) at org.apache.zookeeper.client.ZooKeeperSaslClient$1.run(ZooKeeperSaslClient.java:320) at org.apache.zookeeper.client.ZooKeeperSaslClient$1.run(ZooKeeperSaslClient.java:317) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:317) at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:303) at org.apache.zookeeper.client.ZooKeeperSaslClient.sendSaslPacket(ZooKeeperSaslClient.java:366) at org.apache.zookeeper.client.ZooKeeperSaslClient.initialize(ZooKeeperSaslClient.java:403) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1192)Caused by: KrbException: Identifier doesn't match expected value (906) at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55) ... 20 more2020-02-03 13:49:14,942 [myid:localhost:11223] - ERROR [main-SendThread(localhost:11223):ZooKeeperSaslClient@336] - An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: null (5001))]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
{code}
 
After trying this with different JDK versions, we see that the problem seems to appear
* between OpenJDK 8.232 and 8.242 for java 8
* and between 11.0.3 and 11.0.6 for java 11

There are a lot of kerberos related changes after 8.232: see [https://hg.openjdk.java.net/jdk8u/jdk8u/jdk]

 
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 2 days ago 0|z0b3tc:
ZooKeeper ZOOKEEPER-3714

Add (Cyrus) SASL authentication support to Perl client

New Feature Open Major Unresolved Damien Diederen Damien Diederen Damien Diederen 03/Feb/20 06:09   03/Feb/20 06:56       contrib-bindings   0 1 0 1200   ZOOKEEPER-1112 adds SASL support to the C client library (via the Cyrus SASL implementation). This ticket is about building on that to enable Perl clients to authenticate using {{DIGEST-MD5}} or {{GSSAPI}}. 100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 3 days ago 0|z0b3ow:
ZooKeeper ZOOKEEPER-3713

ReadOnlyZooKeeperServer should not expose the uninitialized ZKDatabase to client during the snapshot loading.

Bug Open Major Unresolved Pierre Yin Pierre Yin Pierre Yin 03/Feb/20 05:50 18/Mar/20 21:52 26/Feb/20 01:23   3.6.0, 3.4.14, 3.5.6   server   0 1 0 15600   The Follower/Observer may load snapshot from disk or leader in some scenarios. During the snapshot loading, the follower/observer may lose the connection from leader when the network is broken.In current design, follower/observer would switch into ReadOnly mode immediately when the network connection from leader is broken. So follower/observer may become ReadOnlyZooKeeperServer before the ZKDatabase initialization of snapshot loading is finished. The time window between follower/observer ReadOnly mode's successful switch and the ZkDatabase's full snapshot loading is unsafe.

The unsafe window may confuse Curator's NodeCache. If NodeCache's underlying reconnection hit the unsafe window, it may get NoNode KeeperException for the specified path and clear the NodeCache. When the unsafe window is elapsed, NodeCache can see the data again.

This behavior is not correct. From client's view, it gets a null value for a short period
when the server ensemble network is broken. Curator NodeCache is often used as configuration's source. Returning null is confusing and introduces logical issues for configuration scenario.

I think the better behavior should be that reject all the reconnecting during the unsafe window. NodeCache still keep the old data when reconnection is rejected. This behavior makes sense.

I will send my patch later. Hope someone can help to review it.
100% 100% 15600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 2 days ago https://github.com/apache/zookeeper/pull/1247 0|z0b3o0:
ZooKeeper ZOOKEEPER-3712

Add setKeepAlive support for NIOServerCnxn

New Feature Resolved Major Fixed Pierre Yin Pierre Yin Pierre Yin 03/Feb/20 02:45   11/Feb/20 00:51 11/Feb/20 00:51 3.6.0, 3.4.14, 3.5.6 3.6.1 server   0 1 0 3000   Suggest to add setKeepAlive support for NIOServerCnxn. It can resolve some tcp connection leak issue caused by network broken. In some occasional case(network switcher broken, network card broken, iptables firewall strategy and so on....), zookeeper server would lose the FIN packet when the client close the connection. In such scenario, the connection will be treated as alive forever and never be closed.
These leaked tcp connections introduce the resource leak risk.

setKeepAlive for every client NIO connection can prevent the resource leak risk.

I will send the patch later. Hope someone can help to review it.

Thanks.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 weeks, 2 days ago https://github.com/apache/zookeeper/pull/1242 0|z0b3h4:
ZooKeeper ZOOKEEPER-3711

Dispose SaslServer instances after use

Bug Open Minor Unresolved Damien Diederen Damien Diederen Damien Diederen 02/Feb/20 10:11   03/Feb/20 03:37       server   0 1 0 3600   The {{SaslServer}} instance held a {{ServerCnxn}} is not explicitly {{disposed()}} when the connection is closed. This means that we are relying on the GC finalizer to release associated resources.

While this does not seem to be problematic in practice, it would be better to explicitly {{dispose()}} the object at {{close()}} time. This is unlikely to make a difference for managed providers, but {{-Dsun.security.jgss.native=true}} can potentially change the game.

----

(For reference, in case somebody searches for this.)

This came up while investigating a file descriptor leak related to the use of the native Sun provider. The issue turned out *not* to be due to the missing dispose, but seems to be caused by a long-standing bug in the MIT Kerberos replay cache:

https://github.com/xrootd/xrootd/issues/414

{quote}
Actually, this is a bug in the kerberos library as we really do close the cache but the descriptor may still leak. This is a known issue and has been fixed in various version of kerberos but apparently not in the version being used here. The only mitigation is to not export tickets (which is not necessary).
{quote}

The problem exists in MIT Kerberos 1.7.1, but will be fixed in 1.8—which replaces the problematic component by a new implementation:

{noformat}
commit e8a35f6962ce2d048616fb7457bff2d90398ca48
Author: Greg Hudson <ghudson@mit.edu>
Date: Wed May 15 01:01:34 2019 -0400

Use file2 replay cache by default

Remove the existing default replay cache implementation and replace it
with a wrapper around the file2 replay cache code. Change the
filename to krb5_EUID.rcache2, ignoring the residual (and therefore
the server principal name). On Windows, use the local appdata
directory if KRB5RCACHEDIR is not set in the environment.

ticket: 8786
{noformat}
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 4 days ago 0|z0b2zs:
ZooKeeper ZOOKEEPER-3710

[trivial bug] fix compile error in PurgeTxnTest introduced by ZOOKEEPER-3231

Bug Resolved Trivial Fixed maoling maoling maoling 30/Jan/20 23:07   31/Jan/20 06:01 31/Jan/20 06:01   3.7.0 tests   0 1 0 600   100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 6 days ago 0|z0b114:
ZooKeeper ZOOKEEPER-3709

Pre-Size Buffer in Learner Request Method

Improvement Open Minor Unresolved Unassigned David Mollitor David Mollitor 30/Jan/20 19:06   09/Mar/20 03:32           0 2 0 1800   {code:java|title=Learner.java}
void request(Request request) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream oa = new DataOutputStream(baos);
oa.writeLong(request.sessionId);
oa.writeInt(request.cxid);
oa.writeInt(request.type);
if (request.request != null) {
request.request.rewind();
int len = request.request.remaining();
byte[] b = new byte[len];
request.request.get(b);
request.request.rewind();
oa.write(b);
}
oa.close();
QuorumPacket qp = new QuorumPacket(Leader.REQUEST, -1, baos.toByteArray(), request.authInfo);
writePacket(qp, true);
}
{code}

The default internal array size of {{ByteArrayOutputStream}} is 32 bytes. It will be expanded as required but this operation is not optimal. Since the exact size of the buffer can be pre-determined (long, int, int, request buffer size), it would be better to specify the array size in {{ByteArrayOutputStream}} before writing to it.
100% 100% 1800 0 newbie, noob, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 2 days ago 0|z0b0uo:
ZooKeeper ZOOKEEPER-3708

Move Logging Code into Logging Guard in Learner

Improvement Closed Minor Fixed David Mollitor David Mollitor David Mollitor 30/Jan/20 19:01   14/Feb/20 10:23 02/Feb/20 22:32   3.5.7, 3.7.0, 3.6.1     0 1 0 3600   {code:java|title=Learner.java}
void readPacket(QuorumPacket pp) throws IOException {
...
long traceMask = ZooTrace.SERVER_PACKET_TRACE_MASK;
if (pp.getType() == Leader.PING) {
traceMask = ZooTrace.SERVER_PING_TRACE_MASK;
}
if (LOG.isTraceEnabled()) {
ZooTrace.logQuorumPacket(LOG, traceMask, 'i', pp);
}
}
{code}

The traceMask only matters if trace is enabled, so move it and the associated code into the logging guard.
100% 100% 3600 0 newbie, noob, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 3 days ago 0|z0b0ug:
ZooKeeper ZOOKEEPER-3707

Leadership Election gets stuck in 5 node ensemble

Bug Open Major Unresolved Unassigned Suhas Dantkale Suhas Dantkale 27/Jan/20 20:48   30/Jan/20 14:05   3.5.5   leaderElection   0 3   Scenario:
1. 5 node ensemble-(SID 1,2,3,4,5). 5 is the current Leader.
2. Test brings down 5's ZK process.
3. Leadership election begins. First each SID votes itself to be the leader as expected.
4. SID 1 and SID 2 gets notification from SID 3 before they get Notification from SID 4. They update their vote to propose 3 as the Leader as expected and send notifications.
5. SID 3 receives the notification from 1, 2 and itself and its Election predicate is successfully terminated and it goes to LEADING state, comes out of FLE and goes to the next phase.
6. SID 2 meantime goes to FOLLOWING state , comes out of FLE and goes to the next phase(NEWLEADER sending etc).

so far so good.
7. Meantime (somewhere after step 4) SID 1 receives notification from SID 4 and since SID 4 > SID 3(and zxid is same), SID 1 changes its mind and updates its proposal - now to elect 4 as leader and sends notification.
8. SID 4 is trying to elect itself as leader. And even though SID 2 and SID 3 are out of election, the SID 4 can not get out of election because - not enough number of nodes are following 3(Only 1 is following 3).
9. SID 2 is also stuck in FLE like SID 4.

So, in summary SID 1 and 4 are stuck in FLE (in lookForLeader()) and SID 2 and SID 3 are stuck in the next phase because SID 3's NEWLEADER is not responded by the quorum.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
7 weeks ago 0|z0awqw:
ZooKeeper ZOOKEEPER-3706

ZooKeeper.close() would leak SendThread when the network is broken

Bug Resolved Major Fixed Pierre Yin Pierre Yin Pierre Yin 25/Jan/20 03:28   02/Mar/20 21:15 02/Mar/20 21:13 3.6.0, 3.4.14, 3.5.6 3.6.1 java client   0 2 0 36000   The close method of ZooKeeper may cause the leak of SendThread when the network is broken.

When the network is broken, the SendThread of ZooKeeper client falls into the continuous reconnecting scenario. But there is an unsafe point which is just at the moment before startConnect() during the continuous reconnecting. If SendThread.close() in another thread hit the unsafe point, startConnect() would sleep some time and force to change state to States.CONNECTING although SendThread.close() already set state to States.CLOSED. In this case, the SendThread would be never be dead and nobody would change the state again.

In normal case, ZooKeeper.close() would be blocked forever to wait closeSession packet is finished until the network broken is recovered. But if user set the request timeout, ZooKeeper.close() would break the block waiting within timeout and invoke SendThread.close() to change state to CLOSED. That's why SendThread.close() can hit the unsafe point.

Set request timeout is a very common practice. 

I propose a patch and send it out later.

Maybe someone can help to review it.

 

Thanks

 

 
100% 100% 36000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 2 days ago https://github.com/apache/zookeeper/pull/1235 0|z0auw0:
ZooKeeper ZOOKEEPER-3705

Filtering unreachable hosts without using ICMP

Improvement Open Major Unresolved Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 23/Jan/20 09:46   19/Mar/20 11:15   3.6.0 3.7.0     0 2   This is a follow-up ticket for ZOOKEEPER-3698, what was a quick fix to make the multi-address feature (introduced in ZOOKEEPER-3188) working on mac if ICMP throttling is enabled.

The whole purpose of the multi-address feature is to always try to use an address which works. The current implementation is (in case of the leader election) always filters the address list using {{InetAddress.isReachable()}} calls to find out which is the working server address. This will cause ICMP calls (or TCP connections on port 7 (Echo) of the destination host), depending on the native implementation (see the [Oracle docs|https://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#isReachable(int)])

So if the {{InetAddress.isReachable}} can not reach the host, then the current multi-address feature will not able to take the given address as a working one. Basically right now it can not distinguish between the case of a broken network link (when the whole node is unreachable) and the case of a disabled ICMP (when only the ICMP port and the port 7 is disabled in the firewall of the destination host). 

A few ideas how to handle this better: 
* One way to improve this could be to implement something like the {{ruok}} 4LW command for the server ports. Some simple request-response messages that only shows that the server is alive and listen on the given election / quorum port. Then we could use that instead of the ICMP calls.
* One other way can be to implement something like how the Learner is doing this right now (if I remember correctly, it basically starts to connect to all known Quorum ports in parallel, then keep the connection which is established first). However, it might be more tricky in case of the Leader Election protocol...
* An other way would be just to try to establish a connection to the election addresses one-by-one, and go to the next one if the call fails. It would be slower, but we wouldn't rely on {{InetAddress.isReachable()}}.

A few challenges we also need to consider:
* it can be tricky to detect if the current election address become unavailable. This is an other edge case where we currently use {{InetAddress.isReachable()}}. (this is why we call the {{SendWorker.asyncValidateIfSocketIsStillReachable()}})
* we also need to take the backward-compatibility into consideration for the leader election protocol during rolling upgrades

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 hours ago 0|z0aswo:
ZooKeeper ZOOKEEPER-3704

upgrade maven dependency-check to 5.3.0

Task Closed Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 22/Jan/20 11:38   14/Feb/20 10:23 23/Jan/20 10:52 3.6.0, 3.5.6, 3.7.0 3.6.0, 3.5.7, 3.7.0 build, security   0 0 0 3600   Upgrade maven dependency checker to the latest version. 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 1 day ago 0|z0aroo:
ZooKeeper ZOOKEEPER-3703

Publish a Test-Jar from ZooKeeper Server

Improvement Closed Major Fixed Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 21/Jan/20 16:27   14/Feb/20 10:23 04/Feb/20 04:44 3.5.6 3.6.0, 3.5.7 tests   0 1 0 2400   It would be very helpful to Apache Curator and others if ZooKeeper published its testing code as a Maven Test JAR. Curator, for example, could use it to improve its testing server to make it easier to inject error conditions without having to have forced time delays and other hacks.

NOTE: if we move forward with gRPC (ZOOKEEPER-102) that would be in a new module and this would be required. So, might as well do it now.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 2 days ago 0|z0aqmo:
ZooKeeper ZOOKEEPER-3702

AdminServer stops responding (possible deadlock)

Bug Open Major Unresolved Unassigned Craig Condit Craig Condit 20/Jan/20 15:41   20/Jan/20 17:13   3.5.6   server   0 1   CentOS 7.x, JDK 8 (various patch levels) We have been running Zookeeper 3.5.6 on several clusters for a while now, and have noticed (pretty consistently) that the new Admin Server seems to stop responding (hangs) after the ZK service has been up and running for a while. I'm not sure what causes this, but it seems to happen fairly reliably after some time (sometimes 10 minutes or more). This manifests as curl (or any other HTTP client) hanging while attempting to access any URL from the admin server port, even the top level which normally just returns a generic Jetty 404 error.

Possibly this was triggered by a Jetty version update?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 3 days ago 0|z0ap80:
ZooKeeper ZOOKEEPER-3701

Split brain on log disk full

Bug Patch Available Blocker Unresolved Andor Molnar Ivan Kelly Ivan Kelly 20/Jan/20 05:02   14/Feb/20 14:31   3.4.13, 3.5.6       0 5 0 22800   We ran into a situation where the cluster ended up with split brain when the log disk filled up on a node.

The ZK cluster(3 node) in question was being used as the metadata store for Pulsar. There was an outage in the Pulsar cluster, where two the ZK nodes filled up there log disks, causing the cluster to lose quorum. Once we rectified the full disk situation and restarted the nodes everything seemed to work, but we started getting a lot of log messages about UpdateMetadataLoop retrying. UpdateMetadataLoop is used to update bookkeeper ledger metadata. If it sees a write conflict it rereads the znode, checks whether the update needs to happen, applies it and writes. These retries were flooding the log on a subset of the brokers. It turned out that it was reading a znode with version 0, but when it tried the setData with version set to 0 it was failing because the znode had a version of 2 (there were many instances of this). After investigating this, we saw that the znode had a different stat and value on ZK-1 to that on ZK-0 & ZK-2.

We resolved the situation by deleting the log and snapshots from ZK-1 and restarting, at which point everything went back to normal. Had ZK-1 managed to become leader we would have been in a lot of trouble, but thankfully this didn't happen.

For the sequence of events that led to split brain, I'll refer to the following code.
{code}
public class FileTxnSnapLog {
...
public boolean truncateLog(long zxid) throws IOException {
// close the existing txnLog and snapLog
close();

// truncate it
FileTxnLog truncLog = new FileTxnLog(dataDir);
boolean truncated = truncLog.truncate(zxid);
truncLog.close();

// re-open the txnLog and snapLog
// I'd rather just close/reopen this object itself, however that
// would have a big impact outside ZKDatabase as there are other
// objects holding a reference to this object.
txnLog = new FileTxnLog(dataDir);
snapLog = new FileSnap(snapDir);

return truncated;
}

public void close() throws IOException {
txnLog.close();
snapLog.close();
}
}

public class FileSnap implements SnapShot {
...
public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot)
throws IOException {
if (!close) {
// actual snapshot code
}
}

@Override
public synchronized void close() throws IOException {
close = true;
}
}
{code}

The sequence of events that lead to the failure are:

| 2020-01-04 01:56:56Z | ZK-2 fails to write to its transaction log due to disk full. ZK-2 is still participating in leader election. ZK-2 becomes a follower of ZK-1. ZK-1 sends TRUNC to ZK-2. truncLog.truncate on ZK-2 throws an exception because of the disk being full, and leaves the process in a broken state. |
|2020-01-04 02:35:23Z | ZK-2 removes 9 transaction logs from disk (bringing it from 100% to 19%). It doesn't recover because its in a broken state. |
|2020-01-09 08:57:33Z| ZK-1 fails to write to its transaction log due to disk full. Restarts as follower. Goes into loop of dropping from quorum (because it can't update transaction log)|
|2020-01-09 08:59:33Z |ZK-1 receives snapshot from leader (ZK-0) (at 1e00000000). ZK-1 persists snapshot, but fails to add subsequent transations to log due to lack of space. ZK-1 drops from quorum.|
|2020-01-09 09:00:12Z |ZK-1 joins quorum as follower. 1e00000000 is close enough to leader to receive TRUNC(1d0000001d). TRUNC fails because txnLog can't flush on close() in trunateLog. ZK-1 goes into loop, dropping and joining quorum.|
|2020-01-09 09:39:00Z |ZK-1 runs purgeTxnLog. Process doesn't recover due to truncation exception having broken FileTxnSnapLog.|
|2020-01-09 19:28:37Z |ZK-1 is restarted. ZK-1 joins quorum as follower. ZK-1 receives TRUNC(1d0000001d). In this case, txnLog.close() can succeed because there's nothing to flush. snapLog is closed. truncLog.truncate fails with "java.io.IOException: No log files found to truncate! This could happen if you still have snapshots from an old setup or log files were deleted accidentally or dataLogDir was changed in zoo.cfg.". It's true that there are no log files to truncate because the snapshot is at 1e00000000 which was received from the leader at 08:59 and nothing has been logged since. In any case, FileTxnSnapLog is in another inconsistent state. snapLog is closed. txnLog is closed, but nothing was ever written to it, so it looks like brand new.|
|2020-01-09 19:29:04Z| ZK-2 is restarted. ZK-2 & ZK-0 are now in a good state, so they can make progress. Transactions start to be logged.|
|2020-01-09 19:33:16Z| ZK-1 joins the quorum. As progress has been made, it receives a SNAP from the leader at 6b30001183a. It writes a snapshot, which ultimately calls FileSnap#serialize. Nothing hits the snapshot disk, because FileSnap is in closed state since 19:28. ZK-1 starts logging transactions to its log disk.|
|2020-01-09 19:42:00Z |We do a rolling restart of the cluster.|
|2020-01-09 19:45:11Z |ZK-1 loads the last snapshot that has been persisted to disk (1e00000000), and applies all log entries with zxid greater than the snapshot (6b30001183a onwards). |
|2020-01-09 19:47:35Z |ZK-2 & ZK-1 form a quorum, ZK-2 leading. ZK-1 reports its lastZxid as 6b30001b32f and gets a DIFF from ZK-2.|

From this point on, the cluster has split brain. ZK-1 is missing all transaction between 1e00000000 and 6bf0001183a.

There's a couple of failures in the code that could stop this problem.
- An exception in truncateLog should nuke the process. Even without the split brain occurring, the processes limped on in a broken state for days and required human intervention to get going again.
- snapLog and txnLog should be defensively nulled after they're closed.
- FileSnap#serialize should not fail silently if close=true. This is really bad. It should at least throw an exception.


The issue occurred with 3.4.13 running on a kubernetes cluster. The bad code paths still exist on master.
100% 100% 22800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 6 days ago 0|z0aomg:
ZooKeeper ZOOKEEPER-3700

Several types of QuorumCxnManager connection error logs include exception text that add no value

Improvement Open Minor Unresolved Unassigned Jason Kania Jason Kania 19/Jan/20 20:27   19/Jan/20 20:29   3.5.6   quorum   0 1   Currently the QuorumCxnManager connectOne method dumps an exception when it encounters java.net.SocketTimeoutException: Read timed out, or java.net.ConnectException: Connection refused in addition to providing an error message.

As an example, the following output is seen:

[2020-01-20 00:21:23,828] WARN Cannot open channel to 3 at election address aaa-3/10.0.1.3:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:838)

 

These exceptions are frequently output when launching and restarting several zookeeper servers and create confusion in what are normal operations and expected errors. I would suggest a few of these specific expected errors could be detected and reduced to only the text error output without the accompanying exception

When launching the first node in a 3 node quorum cluster, about 120 lines of error output are generated for a working launch.

I would be happy to make some of these changes if this approach is agreeable to the maintainers. My approach would be to look for the specific standard conditions in the exception handling and eliminate the exception stack trace where present in these cases.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 3 days ago 0|z0ao80:
ZooKeeper ZOOKEEPER-3699

upgrade jackson-databind to address CVE-2019-20330

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 18/Jan/20 14:20   14/Feb/20 10:23 23/Jan/20 05:20 3.6.0, 3.5.6, 3.7.0 3.6.0, 3.5.7, 3.7.0 security   0 2 0 2400   owasp is flagging
https://builds.apache.org/view/S-Z/view/ZooKeeper/job/zookeeper-master-maven-owasp/329/console

> [ERROR] jackson-databind-2.9.10.1.jar: CVE-2019-20330

"FasterXML jackson-databind 2.x before 2.9.10.2 lacks certain net.sf.ehcache blocking"

I don't believe we use "ehcache" but we should upgrade asap.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks ago 0|z0ands:
ZooKeeper ZOOKEEPER-3698

NoRouteToHostException when starting large ZooKeeper cluster on localhost

Bug Resolved Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 17/Jan/20 05:15   19/Mar/20 05:09 23/Jan/20 07:44   3.6.0, 3.7.0     0 2 0 5400   During testing RC for 3.6.0, we found that ZooKeeper cluster with large number of ensemble members (e.g. 23) can not start properly. We see a lot of warnings in the log:
{code:java}
2020-01-15 20:02:13,431 [myid:13] - WARN
[ListenerHandler-phunt-MBP13.local/192.168.1.91:4193:QuorumCnxManager@691]
- None of the addresses (/192.168.1.91:4190) are reachable for sid 10
java.net.NoRouteToHostException: No valid address among [/192.168.1.91:4190]
{code}
 and also:
{code:java}
2020-01-17 11:02:26,177 [myid:4] - WARN  [Thread-2531:QuorumCnxManager$SendWorker@1269] - destination address /127.0.0.1 not reachable anymore, shutting down the SendWorker for sid 6
{code}
The exceptions are happening when the new MultiAddress feature tries to filter the unreachable hosts from the address list. This involves the calling of the InetAddress.isReachable method with a default timeout of 500ms, which goes down to a native call in java and basically try to do a ping (an ICMP echo request) to the host. Naturally, the localhost should be always reachable. For some reason, this call gets failed (timeouted or simly refused) on mac if we have many ensemble members. I tested with 9 members and the cluster started properly. With 11-13-15 members it took more and more time to get the cluster to start, and the "NoRouteToHostException" started to appear in the logs. After around 1 minute the 15 ensemble members cluster started, but obviously this is not good this way. (I also tried with JDK 11 but the I found the same behaviour)

 

On linux, I haven't been able to reproduce the problem. I tried with 5, 9, 15 and 23 ensemble members and the quorum always seems to start properly in a few seconds. (I used OpenJDK 1.8.232 on Ubuntu 18.04)

*Update*:

On mac, we we have the ICMP rate limit set to 250 by default. You can turn this off using the following command: sudo sysctl -w net.inet.icmp.icmplim=0
(see [https://krypted.com/mac-os-x/disable-icmp-rate-limiting-os-x/])

Using the above command before starting the 23 ensemble members cluster locally seems to solve the problem for me. (can someone verify?) The question is if this workaround is enough or not.

As far as I can tell, the current code will generate {{2*A*(M-1)}} ICMP calls in each ZooKeeper server during startup, if {{'M'}} is the number of ensemble members and {{'A'}} is the number of election addresses provided for each member. This is not that high, if each ZooKeeper server is started on a different machine, but if we start a lot of ZooKeeper servers on a single machine, then it can quickly go beyond the predefined limit of 250 for mac.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks ago 0|z0am3c:
ZooKeeper ZOOKEEPER-3697

zoo_amulti can attempt to free invalid memory after marshalling errors.

Bug Open Minor Unresolved Unassigned Jeremy Sowden Jeremy Sowden 13/Jan/20 09:53   13/Feb/20 03:49   3.4.14, 3.5.6   c client   0 1   {{zoo_amulti}} only initializes request objects if {{rc == ZOK}}, but it unconditionally calls {{free_duplicate_path}}.  For example:

{noformat}
case ZOO_CHECK_OP: {
struct CheckVersionRequest req;
rc = rc < 0 ? rc : CheckVersionRequest_init(zh, &req,
op->check_op.path, op->check_op.version);
rc = rc < 0 ? rc : serialize_CheckVersionRequest(oa, "req", &req);
enter_critical(zh);
entry = create_completion_entry(zh, h.xid, COMPLETION_VOID, op_result_void_completion, result, 0, 0);
leave_critical(zh);
free_duplicate_path(req.path, op->check_op.path);
break;
}
{noformat}

This means that if there is a marshalling error in one operation, for all the later operations, the request will be initialized, the value of {{req.path}} will be undefined, and {{free_duplicate_path}} may attempt to free an invalid pointer.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
9 weeks, 3 days ago 0|z0agbc:
ZooKeeper ZOOKEEPER-3696

deprecate DigestAuthenticationProvider which uses broken SHA1

Task Open Blocker Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 12/Jan/20 12:56 19/Mar/20 20:03 19/Mar/20 07:51     3.7.0, 3.6.1, 3.5.8 security   0 2   DigestAuthenticationProvider is using SHA1 which is known to be broken, eg recently:
https://shattered.io/
https://sha-mbles.github.io/
etc...

We should mark DigestAuthenticationProvider as deprecated at a minimum, perhaps even just remove it asap. The docs should also reflect this (ie don't use)

We could replace DigestAuthenticationProvider with DigestAuthenticationProvider3 or similar (use SHA3, not SHA2 if we do so) Or perhaps a version that allows the user to select? Regardless, would be good to give a simple option to the end user.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 hours ago 0|z0afbk:
ZooKeeper ZOOKEEPER-3695

Source release tarball does not match repository in 3.6.0

Task Resolved Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 11/Jan/20 16:44   01/Feb/20 06:08 17/Jan/20 11:09 3.6.0 3.6.0, 3.7.0 build   0 1 0 7200   During the release of 3.6.0, rc0, I noticed that the source tarball differs from the repository:
- there is no "dev/docker" directory (so we are missing a part of the codebase, even this is not so important)
- there is no "zookeeper-metrics-providers" directory (so the project is not buildable)
- the c client directory contains temporary files (so we are including 'binaries')

I have also noted that NOTICE file report 2019 and it should be 2020
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 6 days ago 0|z0aexs:
ZooKeeper ZOOKEEPER-3694

Use Map computeIfAbsent in AvgMinMaxCounterSet Class

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 11/Jan/20 11:17   16/Jan/20 15:46 16/Jan/20 15:46   3.7.0     0 1 0 2400   https://github.com/apache/zookeeper/blob/27b92caefd57a60309af06ebce29e56954ca9aac/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounterSet.java#L41

More concise to use JDK facilities for this operation.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks ago 0|z0aev4:
ZooKeeper ZOOKEEPER-3693

Use Map computeIfAbsent in AvgMinMaxCounterSet Class

Improvement Resolved Minor Duplicate David Mollitor David Mollitor David Mollitor 11/Jan/20 11:15   11/Jan/20 11:21 11/Jan/20 11:21         0 1   https://github.com/apache/zookeeper/blob/27b92caefd57a60309af06ebce29e56954ca9aac/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounterSet.java#L41

More concise to use JDK facilities for this operation.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks, 5 days ago 0|z0aeuw:
ZooKeeper ZOOKEEPER-3692

Change Java Package For TestStringUtils

Improvement Open Minor Unresolved Nishanth Entoor David Mollitor David Mollitor 11/Jan/20 10:54   22/Jan/20 16:28           0 1 0 600   Move {{TestStringUtils}} into the common package so that it aligns with the class being tested.

{code:none}
/zookeeper/src/main/java/org/apache/zookeeper/common/StringUtils.java
/zookeeper/src/test/java/org/apache/zookeeper/test/StringUtilTest.java
{code}
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks, 5 days ago 0|z0aeug:
ZooKeeper ZOOKEEPER-3691

Use JDK String Join Method in ZK StringUtils

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 10/Jan/20 23:10   24/Jan/20 02:37           0 1 0 2400   https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#join-java.lang.CharSequence-java.lang.Iterable- 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks, 5 days ago 0|z0aep4:
ZooKeeper ZOOKEEPER-3690

Improving leader efficiency via not processing learner's requests in commit processor

Improvement Open Minor Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 10/Jan/20 20:43   27/Feb/20 21:21           0 2 0 6000   Currently, all the requests forwarded from learners will be processed like the locally received requests from leader's clients, which is non-trivial effort and not necessary to process those in CommitProcessor with session queue create/destroy

To improve the efficiency, we could skip processing those requests in leader's commit processor. Based on the benchmark, this optimization improved around 30% maximum write throughput for read/write mixed workload.
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks, 5 days ago 0|z0aemw:
ZooKeeper ZOOKEEPER-3689

zkCli/ZooKeeperMain relies on system properties for TLS config

New Feature Open Major Unresolved Sankalp Bhatia Ron Dagostino Ron Dagostino 10/Jan/20 12:34   19/Mar/20 12:48   3.6.0, 3.5.5, 3.5.6 3.6.1 security, server   0 4 0 6600   The command line client to ZooKeeper (org.apache.zookeeper.ZooKeeperMain, invoked via bin/zkCli.{bat,sh}) has no facility for accepting TLS client configuration (e.g. keystore/truststore location and password) except via system properties. System properties must be passed on the command line as "-D" arguments and are inherently not secure. There should be a way to pass the client TLS configuration to org.apache.zookeeper.ZooKeeperMain in a more secure way (e.g. via a file). 100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 6 days ago 0|z0ae80:
ZooKeeper ZOOKEEPER-3688

Use StandardCharsets UTF-8 in Jute toString

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 10/Jan/20 10:15   22/Jan/20 09:13       jute   0 1 0 1800   100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks, 6 days ago 0|z0ae20:
ZooKeeper ZOOKEEPER-3687

Jute Use JDK hashCode Methods for Native Types

Improvement Open Major Unresolved David Mollitor David Mollitor David Mollitor 10/Jan/20 10:03   10/Jan/20 10:03       jute   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks, 6 days ago 0|z0ae1c:
ZooKeeper ZOOKEEPER-3686

Use JDK Arrays hashCode for Jute

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 10/Jan/20 09:50   12/Feb/20 09:53 12/Feb/20 09:53   3.6.1 jute   0 1 0 3000   https://github.com/apache/zookeeper/blob/27b92caefd57a60309af06ebce29e56954ca9aac/zookeeper-jute/src/main/java/org/apache/jute/compiler/JBuffer.java#L82

https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#hashCode(byte[])
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks, 6 days ago 0|z0ae08:
ZooKeeper ZOOKEEPER-3685

Use JDK Arrays Equals for Jute

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 09/Jan/20 18:04   12/Feb/20 09:49 12/Feb/20 09:49   3.6.1     0 1 0 3000   ZK Jute compiler uses its own byte Array equality check. JDK has one which is marked with the {{HotSpotIntrinsicCandidate}} annotation. This means that the JDK may have a native optimization for the routine.

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/Arrays.java#L2654
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks ago 0|z0acwg:
ZooKeeper ZOOKEEPER-3683

Discard requests that are delayed longer than a configured threshold

Improvement Open Minor Unresolved Unassigned Jie Huang Jie Huang 08/Jan/20 17:47   09/Mar/20 17:49     3.7.0 server   0 2 0 12000   The RequestThrottler ensures that no requests more than the system can handle be fed into the request processor pipeline. In the meantime, the throttler queues all incoming requests and there is nothing to instruct the clients to slow down.

This new feature will mark all requests that wait in the RequestThrottler longer that specified throttledOpWaitTime as throttled and such requests will not see any processing other than being fed down the pipeline preserving the order of all requests.

The FinalProcessor will issue an error response (new error code: ZTHROTTLEDOP) for these undigested requests. The intent is for the clients to not retry them immediately.

Also the fact that throttled requests are unprocessed will speed the entire work of the pipeline. Throttled requests are not communicated between servers and only travel thru the server they belong to.
100% 100% 12000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 1 day ago 0|z0abew:
ZooKeeper ZOOKEEPER-3682

Stop initializing new SSL connection if ZK server is shutting down

Improvement Resolved Minor Fixed Unassigned Jie Huang Jie Huang 08/Jan/20 11:34   26/Feb/20 16:42 26/Feb/20 16:42   3.7.0 server   0 2 0 3000   ZK keeps accepting new connections while it's being shut down then immediately close them when it finds out that the ZK server is not running. It's not a big deal before SSL is enabled since creating TCP connections is relatively cheap.
 
With SSL being widely enabled,  creating SSL connections involves handshake that takes non-trivial CPU time, which is wasted since the connections are closed right after. 
 
This JIRA is going to stop initializing TLS handshake if the zkServer is not serving to save resources.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 1 day ago 0|z0ab20:
ZooKeeper ZOOKEEPER-3681

Add s390x support for Travis build

New Feature Resolved Major Fixed Enrico Olivelli Sangita Nalkar Sangita Nalkar 08/Jan/20 02:46   08/Jan/20 04:42 08/Jan/20 04:41   3.7.0     0 1 0 1800   As Travis CI officially supports ([https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z)|https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z] s390x builds, adding support for same.

Raised PR "[https://github.com/apache/zookeeper/pull/1166]" with the changes.
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 1 day ago 0|z0aabs:
ZooKeeper ZOOKEEPER-3680

Standardize on commons-lang 2.6 for zookeeper-contrib modules

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 07/Jan/20 13:56   07/Jan/20 22:49           0 1 0 1200   ZooKeeper parent pom defines 2.6, but 2.4 still exists elsewhere. 100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 2 days ago 0|z0a9r4:
ZooKeeper ZOOKEEPER-3679

Upgrade maven-compiler-plugin For ZooKeeper-jute

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 07/Jan/20 13:43   08/Jan/20 04:44 08/Jan/20 04:44   3.7.0     0 1 0 1200   Let it match the same version as the rest of the project (3.8). 100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 1 day ago 0|z0a9q8:
ZooKeeper ZOOKEEPER-3678

Remove Redundant GroupID from Maven POMs

Improvement Resolved Trivial Fixed David Mollitor David Mollitor David Mollitor 07/Jan/20 13:34   20/Jan/20 04:59 13/Jan/20 08:11   3.7.0     0 1 0 4800   No need to declare a {{groupId}} in each POM because it is inherited from the parent POM file. 100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 2 days ago 0|z0a9q0:
ZooKeeper ZOOKEEPER-3677

owasp checker failing for - CVE-2019-17571 Apache Log4j 1.2 deserialization of untrusted data in SocketServer

Task Closed Major Fixed Enrico Olivelli Patrick D. Hunt Patrick D. Hunt 07/Jan/20 12:22   14/Feb/20 10:23 18/Jan/20 14:10   3.5.7, 3.7.0, 3.6.1 security   0 1 0 1200   Doesn't look like this impacts us (we don't use SocketServer) however we should figure out what to do as the owasp checker is failing and the rating is quite high (9.8 - bound to get interest)

https://nvd.nist.gov/vuln/detail/CVE-2019-17571

Perhaps ZOOKEEPER-2342 should be prioritized.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 5 days ago
Reviewed
0|z0a9mw:
ZooKeeper ZOOKEEPER-3676

Clean Up TxnLogProposalIterator

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 06/Jan/20 21:37   02/Feb/20 22:38           0 1 0 2400   * Use JDK Collections.emptyIterator where needed
* The the code manually returns an emptyIterator  when an error occurs, but it's also possible to return an emptyIterator by passing 'null' to the TxnLogProposalIterator constructor.  This is a bit ambiguous... why allow both?  Null values 'suck' so I think it's better to just make sure that emptyIterator is returned where needed an throw NPE if a 'null' value is passed in.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 1 day ago 0|z0a8rc:
ZooKeeper ZOOKEEPER-3675

Clean Up committedLog Interaction in ZKDatabase

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 06/Jan/20 21:22   27/Jan/20 12:44           0 1 0 2400   * To be proper, minCommittedLog/maxCommittedLog should only be modified in a lock.
* maxCommittedLog is potentially set twice (to the same value) for each call to the method
* Streamline code in addCommittedProposal
* Pre-initialize committedLog data structure to the full size
* Remove unused commitLogBuffer
* Remove synchronization of method getCommittedLog(), it is protected with lock
* Unify grabbing locks outside of try block
* Fix off-by-one error in the size of the buffer
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 2 days ago 0|z0a8qg:
ZooKeeper ZOOKEEPER-3674

zookeeper.ssl.clientAuth ignored

Bug Resolved Major Fixed Unassigned Ron Dagostino Ron Dagostino 06/Jan/20 13:22   18/Feb/20 10:14 18/Feb/20 10:14 3.5.5, 3.5.6 3.5.7 security, server   0 2   Setting zookeeper.ssl.clientAuth currently has no impact; a client certificate is currently always required. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 2 days ago 0|z0a8a8:
ZooKeeper ZOOKEEPER-3673

Getting a snapshot from leader cause Connection reset shutdown Follower and repeated forever

Bug Open Major Unresolved Unassigned jx jx 05/Jan/20 21:45   05/Jan/20 21:45   3.4.12       0 1   when one broker restart, zk repeated forever

1. Getting a snapshot from leader

2. Snapshotting to disk

3. cause Connection reset

4. shutdown Follower

 

Does get snapshot from leader or snapshot to disk cause synclimit timeout ?
{code:java}
// code placeholder
2020-01-05 22:56:31,168 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2020-01-05 22:56:31,169 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:java.io.tmpdir=/tmp
2020-01-05 22:56:31,169 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:java.compiler=<NA>
2020-01-05 22:56:31,169 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:os.name=Linux
2020-01-05 22:56:31,169 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:os.arch=amd64
2020-01-05 22:56:31,169 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:os.version=3.10.104-1-tlinux2_kvm_guest-0022.tl2
2020-01-05 22:56:31,169 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:user.name=user_00
2020-01-05 22:56:31,169 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:user.home=/home/user_00
2020-01-05 22:56:31,170 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment@100] - Server environment:user.dir=/usr/local/services/zookeeper-3_4_12-V8-32-400-cluster-001-0.0
2020-01-05 22:56:31,171 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /data/zookeeper/version-2 snapdir /data/zookeeper/version-2
2020-01-05 22:56:31,183 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 81
2020-01-05 22:56:31,185 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: 100.94.122.151 to address: /100.94.122.151
2020-01-05 22:56:31,190 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Learner@336] - Getting a snapshot from leader 0xb1a0a15a6
2020-01-05 22:57:19,023 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FileTxnSnapLog@296] - Snapshotting: 0xb1a0a15a6 to /data/zookeeper/version-2/snapshot.b1a0a15a6
2020-01-05 22:57:53,554 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:Learner@387] - Got zxid 0xb1a0a15a7 expected 0x1
2020-01-05 22:57:53,596 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:Follower@90] - Exception when following the leader
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:94)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:87)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:380)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2020-01-05 22:57:53,615 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Follower@169] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:169)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:985)
2020-01-05 22:57:53,615 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FollowerZooKeeperServer@140] - Shutting down
2020-01-05 22:57:53,615 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer@909] - LOOKING
2020-01-05 22:57:53,616 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@813] - New election. My id = 3, proposed zxid=0xb1a0a15a6
2020-01-05 22:57:53,617 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message format version), 3 (n.leader), 0xb1a0a15a6 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0xb (n.peerEpoch) LOOKING (my state)
2020-01-05 22:57:53,618 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0xa0000001b (n.zxid), 0x1 (n.round), FOLLOWING (n.state), 1 (n.sid), 0xb (n.peerEpoch) LOOKING (my state)
2020-01-05 22:57:53,618 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0xa0000001b (n.zxid), 0x1 (n.round), LEADING (n.state), 2 (n.sid), 0xb (n.peerEpoch) LOOKING (my state)
2020-01-05 22:57:53,619 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer@979] - FOLLOWING
2020-01-05 22:57:53,619 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /data/zookeeper/version-2 snapdir /data/zookeeper/version-2
2020-01-05 22:57:53,619 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 3
2020-01-05 22:57:53,619 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: 100.94.122.151 to address: /100.94.122.151
2020-01-05 22:57:53,628 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Learner@336] - Getting a snapshot from leader 0xb1a0a4842
2020-01-05 22:58:34,196 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FileTxnSnapLog@296] - Snapshotting: 0xb1a0a4842 to /data/zookeeper/version-2/snapshot.b1a0a4842
2020-01-05 22:59:03,670 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:Learner@387] - Got zxid 0xb1a0a4843 expected 0x1
2020-01-05 22:59:03,692 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:Follower@90] - Exception when following the leader
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 3 days ago 0|z0a7fc:
ZooKeeper ZOOKEEPER-3672

BambuBet

Bug Open Major Unresolved Unassigned BambuBet BambuBet 05/Jan/20 04:07   17/Feb/20 06:42   3.4.14 3.7.0 java client   0 3   BambuBet merupakan Situs Judi Online terepecaya masa kini di karenakan [#BambuBet]http://www.bambubet.org/ memliki Agent Marketing yang tersebar di seluruh indonesia untuk memasarkan Producknya ke  Seluruh Indonesia. Tentunya tidak hanya memiliki pasaran yang tersebar di seluruh Indonesia. Bambubet juga memiliki Pelayanan yang bagus , Sperti Proses Deposit dan withdraw yang cepat. tidak hanya itu BambuBet juga memiliki Pelayanan seperti memberikan Bocoran togel Besok atau pun Bocoran Togel Hari ini. Tentunya Sebagai [#Bandar Togel Terpercaya]http://www.bambubet.org/ di Indonesia BambuBet tidak pernah bermain-main untuk memasarakan Produknya ke Indonesia. Segala Fasilitas dan apapun yang di inginkan para member akan di Sediakan dan di layani dengan Baik Oleh BambuBet. http://www.bambubet.org/ WL, patch 9223372036854775807 WA : +62 823 6076 8385
Line : BambuBet

Info lebih lanjut dapat chat ke liveChat kami yha bosku :

https://tawk.to/chat/5d4abe8c7d27204601c9bf98/default
No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 3 days ago BambuBet , Bandar Togel Terpercaya , Bocoran Togel Besok BambuBet http://www.bambubet.org/ 0|z0a6z4:
ZooKeeper ZOOKEEPER-3671

Use ThreadLocalConcurrent to Replace Random and Math.random

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 30/Dec/19 17:13   14/Jan/20 05:25           0 2 0 6000   {quote}
*_ThreadLocalRandom_ is a combination of _[ThreadLocal|https://www.baeldung.com/java-threadlocal]_ and _Random_ classes, which is isolated to the current thread.* Thus, it achieves better performance in a multithreaded environment by simply avoiding any concurrent access to the _Random_ objects.
{quote}

https://www.baeldung.com/java-thread-local-random

I also had a conversation with a Oracle Java engineer that indicated that {{ThreadLocalRandom}} is preferable with new development.
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks ago 0|z0a3d4:
ZooKeeper ZOOKEEPER-3670

Clean Up Log Statements for SLF4J

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 30/Dec/19 16:28   27/Jan/20 11:27           0 1 0 7800   Not changing anything controversial, just some of the obvious stuff I noticed across the project. 100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks, 3 days ago 0|z0a3bs:
ZooKeeper ZOOKEEPER-3669

Use switch Statement in ClientCnxn SendThread

Task Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 30/Dec/19 13:52   29/Jan/20 08:07 21/Jan/20 11:54   3.7.0, 3.6.1     0 1 0 2400   [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L870] 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 2 days ago 0|z0a36w:
ZooKeeper ZOOKEEPER-3668

Clean up release package for 3.6.0

Task Resolved Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 30/Dec/19 10:10   06/Jan/20 05:53 06/Jan/20 05:53 3.6.0 3.6.0 build, license   0 1 0 4200   At git sha 034bcda589ae9d64ab3467b254179ed37f9b1635 we have the following issues regarding packaging and licensing.
- there is no "LICENSE" file for snappy and for metrics-core
- we need to update the copyright year in NOTICE files
- we need to copy the Airlift reference from NOTICE in the source root to the NOTICE file reported in the binary package
- copy the Java 8 warning from branch-3.5
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks, 3 days ago 0|z0a320:
ZooKeeper ZOOKEEPER-3667

set jute.maxbuffer hexadecimal number throw parseInt error

Bug Closed Major Fixed Sujith Simon bright.zhou bright.zhou 27/Dec/19 07:46   14/Feb/20 10:23 20/Jan/20 02:42 3.5.6 3.6.0, 3.5.7, 3.7.0 java client   0 3 0 4200   100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
8 weeks, 3 days ago 0|z0a1a8:
ZooKeeper ZOOKEEPER-3666

remove the deprecated LogFormatter tool

Improvement Resolved Minor Fixed Nishanth Entoor maoling maoling 26/Dec/19 22:07   22/Jan/20 16:25 21/Jan/20 12:19   3.7.0 scripts   0 2 0 2400   Since 3.5.5, we use *_TxnLogToolkit_* which is better and it's time to delete all the things about *_LogFormatter,_* include the following things:

 - the class: org.apache.zookeeper.server.LogFormatter

 - the related docs
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 2 days ago 0|z0a0s0:
ZooKeeper ZOOKEEPER-3665

support Client side caching in ZooKeeper

New Feature Open Major Unresolved Unassigned maoling maoling 26/Dec/19 22:00   28/Dec/19 05:49       java client, server   0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks, 5 days ago 0|z0a0rs:
ZooKeeper ZOOKEEPER-3664

test

Test Open Minor Unresolved Unassigned Martha Serafina Martha Serafina 24/Dec/19 06:10 19/Mar/20 11:52 17/Feb/20 06:44   3.2.3, 3.4.14 3.7.0 c client   0 3   [pkv games|https://adilqq.info/] AdilQQ adalah situs [judi pkv games|https://adilqq.net/] online uang asli yang sudah dikenal sekarang ini disebut juga situs poker terpercaya di Indonesia dengan teknologi server pokerv pkv games tingkat kemenangan winrate tertinggi serta deposit termurah saat ini. Adapun games judi poker qq online di situs AdilQQ ini seperti domino qq, [domino 99|https://community.atlassian.com/t5/user/viewprofilepage/user-id/3686536], bandarqq, aduq, capsa susun, bandarpoker, sakong, bandar66 adu balak tentu sudah disenangi oleh para penggemar judi poker online terpercaya, domino99 online terkini. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 3 days ago 0|z09yu0:
ZooKeeper ZOOKEEPER-3663

Clean Up ZNodeName Class

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 23/Dec/19 14:54   21/Jan/20 12:15 21/Jan/20 12:14   3.7.0     0 1 0 5400   # Make class immutable
# Enforce null check of constructor
# Enhance and add unit tests
# Make 'sequence' an {{Optional}} field
# Change name of getter 'getZNodeName' to more appropriate 'getSequence'

 

This is a {{default}} scoped class so please allow for some change in API
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 2 days ago 0|z09y2o:
ZooKeeper ZOOKEEPER-3662

Remove NPE Possibility in Follower Class

Improvement Open Minor Unresolved David Mollitor David Mollitor David Mollitor 23/Dec/19 12:58   02/Mar/20 10:54           0 1 0 4800   {code:java|title=Follower.java}
public long getZxid() {
try {
synchronized (fzk) {
return fzk.getZxid();
}
} catch (NullPointerException e) {
LOG.warn("error getting zxid", e);
}
return -1;
}
{code}

I traced the code and there is no reason to catch a NPE here. Add additional restrictions to make sure NPE will never happen.
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 3 days ago 0|z09y08:
ZooKeeper ZOOKEEPER-3661

Failed to execute goal on project zookeeper-recipes

Bug Resolved Major Not A Problem Enrico Olivelli maoling maoling 23/Dec/19 01:51   26/Dec/19 21:55 24/Dec/19 11:26         0 2   mvn clean package -Dmaven.test.skip=true -U
{code:java}
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 35.080 s
[INFO] Finished at: 2019-12-23T14:46:29+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project zookeeper-recipes: Could not resolve dependencies for project org.apache.zookeeper:zookeeper-recipes:pom:3.7.0-SNAPSHOT: Could not find artifact org.apache.zookeeper:zookeeper:jar:tests:3.7.0-SNAPSHOT in spring-snapshot (http://repo.spring.io/snapshot) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :zookeeper-recipes

{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks, 6 days ago 0|z09xfc:
ZooKeeper ZOOKEEPER-3660

support the lz4 compress mode for snapshot

Improvement Open Major Unresolved Unassigned maoling maoling 23/Dec/19 01:31   23/Dec/19 01:39           0 2   Currently, we have the following compress mode for snapshot, we also need the lz4 compress mode which is popular used by other projects like hbase, kafka
{code:java}
public enum StreamMode {
GZIP("gz"),
SNAPPY("snappy"),
CHECKED("");
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 3 days ago 0|z09xew:
ZooKeeper ZOOKEEPER-3659

make the WatchManagerFactory log more readable

Improvement Open Minor Unresolved Rabi Kumar K C maoling maoling 21/Dec/19 01:19   17/Feb/20 22:15       server   0 2 0 4800   {code:java}
2019-12-19 20:28:58,854 [myid:] - INFO [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2019-12-19 20:28:58,854 [myid:] - INFO [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager{code}
the logs looks unclear, seems like duplicated log. What we want is:
{code:java}
2019-12-19 20:28:58,854 [myid:] - INFO [main:WatchManagerFactory@42] - dataWatches is using org.apache.zookeeper.server.watch.WatchManager as watch manager
2019-12-19 20:28:58,854 [myid:] - INFO [main:WatchManagerFactory@42] - childWatches is using org.apache.zookeeper.server.watch.WatchManager as watch manager
{code}
 
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 2 days ago 0|z09wjs:
ZooKeeper ZOOKEEPER-3658

Potential data inconsistency due to txns gap in committedLog when ZkDB not fully shutdown

Bug Resolved Major Invalid Fangmin Lv Fangmin Lv Fangmin Lv 20/Dec/19 03:22   16/Jan/20 15:44 16/Jan/20 15:44 3.6.0, 3.5.6   server   0 1   During DIFF sync, the txns will be applied to learner's DataTree but it won't be added into the in memory committed txns cache in ZkDatabase. If this server became new leader later, and when other servers try to sync with it, it may cause data inconsistency due to part of txns are missing.

This is not a problem if we fully shutdown the ZkDB and reload from disk, but the current behavior in 3.5 and 3.6 will not fully shutdown the DB, which is a nice optimization to reduce the unavailable time with large snapshot.

Internally, we have another version of 'Retain DB' implementation, and we caught this issue with the digest feature we just upstreamed, and have fixed that internally. Just realized we haven't upstreamed that, and this is the Jira for that issue, will send a PR for this soon.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
9 weeks ago 0|z09vi8:
ZooKeeper ZOOKEEPER-3657

Implementing snapshot schedule to avoid high latency issue due to disk contention

New Feature Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 19/Dec/19 16:16   28/Jan/20 08:30       server   0 3 0 5400   If ZK server is running a machine with single disk driver, the snapshot and txn fsync thread will have disk IO contention (even on SSD). Majority taking snapshot will affect the txn fsync time, and hence the end to end update and read latency.

To provide better SLA guarantee and improve the write throughput with large snapshot (> 3GB), the snapshot scheduler is implemented internally to avoid majority taking snapshot at the same time, which provides better latency guarantee.

A new quorum packet type SNAPPING is introduced in this feature, leader will send this packet to the followers periodically like PING but less frequently. Followers will send the current status back, like the maximum txns since last snapshot, fsync latency, etc, and leader will decide who should take snapshot.

On follower, it will enable safe snapshot mode if leader is sending SNAPPING, which will only take snapshot if the txns is much larger than the threshold we defined for SyncRequestProcessor, this is used to avoid issues like the follower accumulated too many txns before it is scheduled to take snapshot.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 6 days ago 0|z09uzk:
ZooKeeper ZOOKEEPER-3656

SyncRequestProcessor doesn't update lastFlushTime correctly on observers

Bug Resolved Major Fixed Unassigned Eric Hammerle Eric Hammerle 18/Dec/19 19:36   07/Jan/20 14:39 07/Jan/20 14:39   3.7.0     0 2 0 1200   This issues was introduced in [ZOOKEEPER-3311|https://github.com/apache/zookeeper/pull/851]. The lastFlushTime used to decide the batch window is not updated correctly for the observer case when the nextProcessor is always be null.

This can cause observers to fall behind and their sync queue to grow indefinitely.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 2 days ago 0|z09to0:
ZooKeeper ZOOKEEPER-3655

Fix QuorumSSLTest to remove hardcoded localhost and port

Bug Open Minor Unresolved Unassigned Kishor Patil Kishor Patil 18/Dec/19 15:36   02/Mar/20 07:31   3.5.5, 3.5.6   server   0 1 0 4800   Depending on localhost and /etc/hosts config, this test can fail.. So trying make it more resilient.

I have put up the patch on https://github.com/apache/zookeeper/pull/1188
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 1 day ago 0|z09tbs:
ZooKeeper ZOOKEEPER-3654

Incorrect *_CFLAGS handling in Automake

Bug Open Major Unresolved Damien Diederen Damien Diederen Damien Diederen 18/Dec/19 09:56   23/Jan/20 11:38       c client   0 1 0 1200   The {{Makefile.am}} distributed with the C client defines some per-target {{\*_CFLAGS}} and {{\*_CXXFLAGS}} variables. These however, do not reference {{AM_CFLAGS}} (resp. AM_CXXFLAGS}}, which means that some options (notably {{-Wall}}) are missing when building subsets of the code.

Dixit the [Automake docs|https://www.gnu.org/software/automake/manual/html_node/Program-and-Library-Variables.html]:

{quote}
In compilations with per-target flags, the ordinary ‘AM_’ form of
the flags variable is _not_ automatically included in the
compilation (however, the user form of the variable _is_ included).
So for instance, if you want the hypothetical ‘maude’ compilations
to also use the value of ‘AM_CFLAGS’, you would need to write:

maude_CFLAGS = ... your flags ... $(AM_CFLAGS)
{quote}

Restoring the flags, however, causes compilation failures (in the library) and a slew of new warnings (in the tests) which had not been noticed because of the missing options. These errors/warnings have to be fixed before the flags can be tightened up.

(I have a preliminary patch, and am planning to submit a "pull request" soon.)
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 1 day ago 0|z09t1k:
ZooKeeper ZOOKEEPER-3653

Audit Log feature fails in a stand alone zookeeper setup

Bug Resolved Major Fixed Sujith Simon Sujith Simon Sujith Simon 17/Dec/19 08:12   19/Dec/19 03:55 19/Dec/19 03:55   3.6.0, 3.7.0 audit   0 1 0 1200   When the Audit Log feature is enabled in a standalone zookeeper setup, an error pops up which states "Failed to audit log request" with an EndOfFile exception due to an issue with the  deserialization in the AuditHelper.java when the request.request.slice() returns an empty pointer. 100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks ago 0|z09r9c:
ZooKeeper ZOOKEEPER-3652

Improper synchronization in ClientCnxn

Bug Open Major Unresolved Unassigned Sylvain Wallez Sylvain Wallez 16/Dec/19 09:31   08/Mar/20 08:35   3.5.6   java client   1 2 0 3600   ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in {{outgoingQueue}} and draining it when the client connection isn't alive.

There are several issues with this approach:
- the value of the {{state}} field is not stable, meaning we don't always synchronize on the same object.
- the {{state}} field is an enum value, which are global objects. So in an application with several ZooKeeper clients connected to different servers, this causes some contention between clients.

An easy fix is change those {{synchronized(state)}} statements to {{synchronized(outgoingQueue)}} since it is local to each client and is what we want to coordinate.

I'll be happy to prepare a PR with the above change if this is deemed to be the correct way to fix it.

 

Another issue that makes contention worse is {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep statement needed, and can we remove it?

 
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 6 days ago 0|z09pxk:
ZooKeeper ZOOKEEPER-3651

NettyServerCnxnFactoryTest is flaky

Bug Resolved Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 16/Dec/19 05:25   17/Dec/19 03:18 16/Dec/19 16:13 3.5.6 3.6.0, 3.7.0     0 1 0 4800   NettyServerCnxnFactoryTest is flaky, it fails from time to time on jenkins.

e.g. [https://builds.apache.org/view/ZK%20All/job/zookeeper-master-maven/557/org.apache.zookeeper$zookeeper/testReport/org.apache.zookeeper.server/NettyServerCnxnFactoryTest/testOutstandingHandshakeLimit/]

 
{code:java}
INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.zookeeper.server.NettyServerCnxnFactoryTest
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
7.339 s <<< FAILURE! - in
org.apache.zookeeper.server.NettyServerCnxnFactoryTest
[ERROR]
testOutstandingHandshakeLimit(org.apache.zookeeper.server.NettyServerCnxnFactoryTest)
 Time elapsed: 6.569 s  <<< FAILURE!
java.lang.AssertionError:

Expected: is <true>
     but: was <false>
at
org.apache.zookeeper.server.NettyServerCnxnFactoryTest.testOutstandingHandshakeLimit(NettyServerCnxnFactoryTest.java:142)
{code}
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 3 days ago committed to master branch as 20daae7d5fa934629e7825ed72e66ad76a94d6aa

committed also to branch-3.6

But I cannot merge it to branch-3.5 as the patch does not apply cleanly
0|z09pmo:
ZooKeeper ZOOKEEPER-3650

zero or overflown xid disrupts the session

Bug Open Major Unresolved Unassigned Pavel Lobach Pavel Lobach 16/Dec/19 03:07   16/Feb/20 22:15   3.4.14, 3.5.6   c client, java client, server   0 3   This is a follow-up ticket for [https://github.com/alexguan/node-zookeeper-client/issues/100]

I found that above nodejs ZK client (it's 100% pure JS implementation) starts XID counter from 0 in requests which leads to really strange behaviour when throttling happens on ZK server side - please check it out for more details - that's interesting.

Above client will be fixed I hope, but actually, problem is still valid for native Java (partially) and C clients' implementation:

Java client: Actually here I see change was made to avoid xid overflow (ZOOKEEPER-3253), but I don't see it's merged into latest 3.4.14 release - is there any plans for this change to make into 3.4.x release?

C client: overflow is not checked and it has another problem with starting value, here is the code for single-threaded implementation (MT variant uses same logic):
{code:c}
// make sure the static xid is initialized before any threads started
__attribute__((constructor)) int32_t get_xid()
{
static int32_t xid = -1;
if (xid == -1) {
xid = time(0);
}
return fetch_and_add(&xid,1);
}
{code}
starting value is chosen to be time(0) which is current Unix epoch time. It will overflow in the future on its own, making C client (and all implementations using this library as a dependency) completely out of order some day (and I can even tell you exact date :)). And as the time passes, this window (time() .. overflow) shrinks every day, making range available for xid values smaller and smaller... so problems will start happen earlier for clients making large number of requests without session reestablishment.

One more thing to note here:

Why XID=0 is considered as invalid value (check above ticket for details)? It's not stated anywhere, except ZK server code itself which decrements queued requests only for positive XID (below excerpt is from tip of the master branch):
{code:java}
// will be called from zkServer.processPacket
public void decrOutstandingAndCheckThrottle(ReplyHeader h) {
if (h.getXid() <= 0) {
return;
}
if (!zkServer.shouldThrottle(outstandingCount.decrementAndGet())) {
enableRecv();
}
}
{code}
Apart from that, requests with xid=0 are getting through fine. Can we change condition to h.getXid() < 0 ?

And last one: is there some documentation fo ZK wire protocol, explaining fields' meaning and allowed values? I did not find one and had to reverse engineer the logic...
Maybe it's a good idea to create it, so there will not be such misunderstandings and discrepancies in clients' implementations?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 3 days ago 0|z09pf4:
ZooKeeper ZOOKEEPER-3649

ls -s CLI need a line break

Improvement Resolved Minor Fixed Rabi Kumar K C maoling maoling 14/Dec/19 05:42   10/Jan/20 12:08 09/Jan/20 16:52   3.6.0, 3.7.0 scripts   0 1 0 3000   {code:java}
[zk: localhost:2181(CONNECTED) 7] ls -s /
[test, test-12-10, zookeeper]cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x92
cversion = 7
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 3[zk: localhost:2181(CONNECTED) 8]
{code}
What we want is:
{code:java}
[zk: localhost:2181(CONNECTED) 7] ls -s /
[test, test-12-10, zookeeper]
cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x92
cversion = 7
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 3[zk: localhost:2181(CONNECTED) 8]
{code}
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks ago 0|z09okw:
ZooKeeper ZOOKEEPER-3648

remove Hadoop logo in the ZooKeeper documentation

Improvement Resolved Major Fixed Rabi Kumar K C maoling maoling 13/Dec/19 01:56   10/Jan/20 12:08 23/Dec/19 10:50   3.6.0 documentation   0 2 0 1800   +1
I do see a book keeper gif file in repo, should remove that one too.
On my wish list: redesign of the ZooKeeper logo :)
On Thu, Dec 12, 2019 at 7:44 AM Enrico Olivelli <eolivelli@gmail.com> wrote:
> +1
>
> Maybe we should also check if we have old pages about Bookkeeper project.
> It was a subproject of ZK but now it is a (great) top level independent
> project
>
> Enrico
>
> Il gio 12 dic 2019, 16:38 Flavio Junqueira <fpj@apache.org> ha scritto:
>
> > ZooKeeper was a subproject of Hadoop in the early Apache days, and we
> > still carry that flag... ;-)
> >
> > -Flavio
> >
> > > On 12 Dec 2019, at 16:16, Norbert Kalmar <nkalmar@cloudera.com.INVALID
> >
> > wrote:
> > >
> > > Oh, wow, I didn't even notice that until now.
> > > Makes sense, knowing a lot of the time ZK is used "standalone" (I mean
> > > outside of any hadoop ecosystem).
> > >
> > > Regards,
> > > Norbert
> > >
> > > On Thu, Dec 12, 2019 at 2:52 PM Flavio Junqueira <fpj@apache.org>
> wrote:
> > >
> > >> Should we remove that Hadoop logo from the documentation? It has been
> a
> > >> while that we aren't a subproject of Hadoop any longer.
> > >>
> > >> -Flavio
> >
> >
>
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 3 days ago 0|z09n3k:
ZooKeeper ZOOKEEPER-3647

Exception in thread "main" java.lang.NoClassDefFoundError: com/codahale/metrics/Reservoir

Bug Resolved Trivial Won't Fix Unassigned Prachi Prakash Prachi Prakash 12/Dec/19 16:17   16/Feb/20 22:27 16/Feb/20 22:27     build   0 2   After building successfully I was trying to run the ZooKeeperServerMain class with the zoo_sample.cfg and getting the following exception:

Exception in thread "main" java.lang.NoClassDefFoundError: com/codahale/metrics/ReservoirException in thread "main" java.lang.NoClassDefFoundError: com/codahale/metrics/Reservoir at org.apache.zookeeper.metrics.impl.DefaultMetricsProvider$DefaultMetricsContext.lambda$getSummary$2(DefaultMetricsProvider.java:126) at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) at org.apache.zookeeper.metrics.impl.DefaultMetricsProvider$DefaultMetricsContext.getSummary(DefaultMetricsProvider.java:122) at org.apache.zookeeper.server.ServerMetrics.<init>(ServerMetrics.java:74) at org.apache.zookeeper.server.ServerMetrics.<clinit>(ServerMetrics.java:44) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:132) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:67)Caused by: java.lang.ClassNotFoundException: *com.codahale.metrics.Reservoir* at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 8 more



Can anyone guide on how to rectify this, it uses 

<dropwizard.version>3.2.5</dropwizard.version> not the 4.x 

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
4 weeks, 3 days ago 0|z09mmw:
ZooKeeper ZOOKEEPER-3646

Executing multiple commands non-interactively with the C client cli

Improvement Open Major Unresolved Unassigned Mate Szalay-Beko Mate Szalay-Beko 11/Dec/19 09:12   06/Jan/20 06:57       c client   0 2   With the current C CLI client, we can execute a single command, using the {{--cmd}} option.
However, when one wants to execute multiple commands in a session from a bash script, he can not really do this now.

The idea is to allow to execute a command file, where ZooKeeper C client CLI commands are specified in each line. It would be also great to support to read the commands from the standard input, so one is able to use standard unix pipes to channel commands to the zookeeper client. This would allow more dynamic generation of commands from scripts.

What about the following syntax?
{code:bash}
# execute a single command (this is working now)
cli_mt --host localhost:2181 --cmd 'ls /zookeeper'

# new: execute a list of commands specified in a file
cli_mt --host localhost:2181 --cmd ./zk_commands.txt

# new: read the commands from stdin
cat ./zk_commands.txt | cli_mt --host localhost:2181 --cmd
{code}
Or if it is easier / nicer, we can add a new parameter and not use {{--cmd}} for the last two cases.

Please assign the ticket to yourself if you start working on it (or leave a comment).
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 3 days ago 0|z09kgw:
ZooKeeper ZOOKEEPER-3645

C CLI should close session properly

Bug Resolved Minor Duplicate Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 11/Dec/19 06:06   12/Dec/19 09:03 12/Dec/19 09:03 3.5.6       0 2   Whenever you quit from the C command line client, you got a warning on the server logs like:

```
2019-12-11 10:28:40,973 [myid:] - WARN [NIOWorkerThread-6:NIOServerCnxn@364] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /0:0:0:0:0:0:0:1:51026, session = 0x1000012817e001c
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
```

This happens no matter how you exit from the C command-line client. E.g.:
- using Ctrl-C
- using the `quit` command
- using the `--cmd` option and executing a single command
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
14 weeks ago 0|z09k7s:
ZooKeeper ZOOKEEPER-3644

Data loss after upgrading standalone ZK server 3.4.14 to 3.5.6 with snapshot.trust.empty=true

Bug Closed Blocker Fixed Michael Han Manikumar Manikumar 10/Dec/19 10:38   14/Feb/20 10:23 05/Jan/20 16:19 3.5.6 3.6.0, 3.5.7, 3.7.0 server   2 6 0 8400   We have tried to upgrade single node *standalone* ZK server from 3.4.14 to 3.5.6.  There were no snapshot files, so as suggested in ZOOKEEPER-3056, we have set snapshot.trust.empty to true. After server startup, when we tried to list the znodes, we found that znodes are missing.

Steps to reproduce:
# Start a single node ZK 3.4.14 server and create few znodes
# Upgrade the server to 3.5.6 with  snapshot.trust.empty=true config
# try to list the znodes using zkShell

Looking into the [source code|https://github.com/apache/zookeeper/blob/release-3.5.6/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L224], looks like we are not reading transaction log if there are no snapshot files and snapshot.trust.empty is set to true.

ZK 3.5.6 logs:
{quote}[2019-12-07 12:13:35,007] INFO Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 60000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
 (org.apache.zookeeper.server.ZooKeeperServer)
[2019-12-07 12:13:35,012] INFO Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory (org.apache.zookeeper.server.ServerCnxnFactory)
[2019-12-07 12:13:35,014] INFO Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 12 worker threads, and 64 kB direct buffers. (org.apache
.zookeeper.server.NIOServerCnxnFactory)
[2019-12-07 12:13:35,017] INFO binding to port [0.0.0.0/0.0.0.0:2181|http://0.0.0.0/0.0.0.0:2181] (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2019-12-07 12:13:35,027] INFO zookeeper.snapshotSizeFactor = 0.33 (org.apache.zookeeper.server.ZKDatabase)
[2019-12-07 12:13:35,029] DEBUG Created new input stream /var/lib/zookeeper/version-2/log.1 (org.apache.zookeeper.server.persistence.FileTxnLog)
[2019-12-07 12:13:35,031] DEBUG Created new input archive /var/lib/zookeeper/version-2/log.1 (org.apache.zookeeper.server.persistence.FileTxnLog)
[2019-12-07 12:13:35,035] DEBUG EOF exception java.io.EOFException: Failed to read /var/lib/zookeeper/version-2/log.1 (org.apache.zookeeper.server.persistence.FileTxnLog)
[2019-12-07 12:13:35,035] WARN No snapshot found, but there are log entries. This should only be allowed during upgrading. (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2019-12-07 12:13:35,035] INFO Snapshotting: 0x0 to /var/lib/zookeeper/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2019-12-07 12:13:35,036] INFO Snapshotting: 0x0 to /var/lib/zookeeper/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2019-12-07 12:13:35,050] INFO Using checkIntervalMs=60000 maxPerMinute=10000 (org.apache.zookeeper.server.ContainerManager)
[2019-12-07 12:15:07,137] DEBUG Accepted socket connection from /[127.0.0.1:38888|http://127.0.0.1:38888/] (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2019-12-07 12:15:07,144] DEBUG Session establishment request from client /[127.0.0.1:38888|http://127.0.0.1:38888/] client's lastZxid is 0x0 (org.apache.zookeeper.server.ZooKeeperServer)
[2019-12-07 12:15:07,145] DEBUG Adding session 0x100006e15fb0000 (org.apache.zookeeper.server.SessionTrackerImpl)
[2019-12-07 12:15:07,148] TRACE SessionTrackerImpl — Adding session 0x100006e15fb0000 30000 (org.apache.zookeeper.server.SessionTrackerImpl)
[2019-12-07 12:15:07,149] DEBUG Client attempting to establish new session: session = 0x100006e15fb0000, zxid = 0x0, timeout = 30000, address = /[127.0.0.1:38888|http://127.0.0.1:38888/] (org.apache.zookeeper.server.ZooKeeperServer)
[2019-12-07 12:15:07,155] TRACE :Psessionid:0x100006e15fb0000 type:createSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-12-07 12:15:07,155] TRACE SessionTrackerImpl — Existing session 0x100006e15fb0000 30000 (org.apache.zookeeper.server.SessionTrackerImpl)
[2019-12-07 12:15:07,155] INFO Creating new log file: log.1 (org.apache.zookeeper.server.persistence.FileTxnLog)
[2019-12-07 12:15:07,170] DEBUG Processing request:: sessionid:0x100006e15fb0000 type:createSession cxid:0x0 zxid:0x1 txntype:-10 reqpath:n/a (org.apache.zookeeper.server.FinalRequestProcessor)
{quote}
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 4 days ago 0|z09iyo:
ZooKeeper ZOOKEEPER-3643

Testing and documenting secure and unsecure ZK client connection from the same JVM

Test In Progress Major Unresolved Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 10/Dec/19 04:25   16/Jan/20 10:01           0 1 0 5400   We are working in the ZooKeeper SSL integration in HBase. By default, one can enable ZooKeeper SSL client connections using Java System Properties. However, there are certain use-cases, when we need to connect to two ZooKeeper quorum from the same JVM (e.g. when connecting to two HBase clusters for data synchronization). It is possible, that one of the ZooKeeper quorum use SSL while the other doesn't.

In this case it is not possible to use Java System Properties, as those will be affecting both ZooKeeper client connections. These use-cases require code modifications e.g. in HBase to use custom ZooKeeper client configurations. We need to add unit test in ZooKeeper to verify that it works and also make sense to document this use-case to help other open source projects to start using ZooKeeper SSL.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
14 weeks, 2 days ago 0|z09igg:
ZooKeeper ZOOKEEPER-3642

Data inconsistency when the leader crashes right after sending SNAP sync

Bug Open Major Unresolved Fangmin Lv Alex Mirgorodskiy Alex Mirgorodskiy 09/Dec/19 16:13   26/Feb/20 16:51   3.6.0, 3.5.5, 3.5.6, 3.7.0   server   0 4 0 3000   Linux 4.19.29 x86_64 If the leader crashes after sending a SNAP sync to a learner, but before sending the NEWLEADER message, the learner will not save the snapshot to disk. But it will advance its lastProcessedZxid to that from the snapshot (call it Zxid X)

A new leader will get elected, and it will resync our learner again immediately. But this time, it will use the incremental DIFF method, starting from Zxid X. A DIFF-based resync does not trigger snapshots, so the learner is still holding the original snapshot purely in memory. If the learner restarts after that, it will silently lose all the data up to Zxid X.

An easy way to reproduce is to insert System.exit into LearnerHandler.java right before sending the NEWLEADER message (on the one instance that is currently running the leader, but not the others):
{noformat}
LOG.debug("Sending NEWLEADER message to " + sid);
+ if (leader.self.getId() == 1 && sid == 3) {
+ LOG.debug("Bail when server.1 resyncs server.3");
+ System.exit(0);
+ }
{noformat}
If I remember right, the repro steps are as follows. Run with that patch in a 4-instance ensemble where server.3 is an Observer, the rest are voting members, and server.1 is the current Leader. Start server.3 after the other instances are up. It will get the initial snapshot from server.1 and server.1 will stop immediately because of the patch. Say, server.2 takes over as the new Leader. Server.3 will receive a Diff resync from server.2, but will skip persisting the snapshot. A subsequent restart of server.3 will make that instance come up with a blank data tree.

The above steps assumed that server.3 is an Observer, but it can presumably happen for voting members too. Just need a 5-instance ensemble.

Our workaround is to take the snapshot unconditionally on receiving NEWLEADER:
{noformat}
- if (snapshotNeeded) {
+ // Take the snapshot unconditionally. The first leader may have crashed
+ // after sending us a SNAP, but before sending NEWLEADER. The second leader will
+ // send us a DIFF, and we'd still like to take a snapshot, even though
+ // the upstream code used to skip it.
+ if (true || snapshotNeeded) {
zk.takeSnapshot();
}
{noformat}
This is what 3.4.x series used to do. But I assume it is not the ideal fix, since it essentially disables the "snapshotNeeded" optimization.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
5 weeks, 1 day ago 0|z09huw:
ZooKeeper ZOOKEEPER-3641

New ZOO_VERSION define breaks Perl & Python contribs

Bug Resolved Major Fixed Unassigned Damien Diederen Damien Diederen 09/Dec/19 04:22   10/Dec/19 06:22 10/Dec/19 06:21 3.6.0 3.6.0 c client   0 2 0 6600   ZOOKEEPER-3635 changed the versioning scheme for the C client from integer-valued {{ZOO_\{MAJOR,MINOR,PATCH\}_VERSION}} definitions to a single string-valued {{#define ZOO_VERSION "3.6.0"}}.

This causes the Perl and Python contribs to fail to build.

(I'm looking into it.)
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
14 weeks, 3 days ago 0|z09grc:
ZooKeeper ZOOKEEPER-3640

Implement "batch mode" in cli_mt

Improvement Resolved Minor Fixed Damien Diederen Damien Diederen Damien Diederen 09/Dec/19 03:24   09/Jan/20 04:56 06/Jan/20 07:05   3.6.0, 3.7.0 c client   0 2 0 9600   While testing an unrelated pull request, [~symat] noticed that the {{cmd:}} "batch mode" argument ({{\-c}}/{{--cmd}} with {{getopt_long}}) was ignored by {{cli_mt}} (as opposed to {{cli_st}}, which does the expected thing):

https://github.com/apache/zookeeper/pull/1131#issuecomment-561631843

It turns out that "batch mode" was never implemented for the {{THREADED}} case.

https://github.com/apache/zookeeper/commit/36ed46fc726fd#diff-419c86bcc09a6b28f27161879deed603R488-R492
100% 100% 9600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 3 days ago 0|z09go8:
ZooKeeper ZOOKEEPER-3639

Unexpected exception causing CommitProcessor to exit

Bug Open Major Unresolved Unassigned sunfeifei sunfeifei 06/Dec/19 02:32   11/Dec/19 21:37           0 1   zk_version 3.4.6--1, built on 06/02/2015 12:00 GMT
CentOS release 6.5 (Final)
Linux 2.6.32-431.20.3.el6
java version "1.7.0_76"
2019-12-01 21:46:16,537 [myid:161] - ERROR [CommitProcessor:161:CommitProcessor@148] - Unexpected exception causing CommitProcessor to exit
java.lang.NullPointerException
at org.apache.zookeeper.server.ZKDatabase.addCommittedProposal(ZKDatabase.java:250)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:120)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2019-12-04 21:46:16,537 [myid:161] - INFO [CommitProcessor:161:CommitProcessor@150] - CommitProcessor exited loop!



2019-12-01 21:46:36,616 [myid:161] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@841] - Refusing session request for client /192.168.40.20:14973 as it has seen zxid 0x1119e271e9 our last zxid is 0x1119e271e7 client must try another server
2019-12-01 21:46:36,617 [myid:161] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@841] - Refusing session request for client /192.168.53.50:51241 as it has seen zxid 0x1119e271e9 our last zxid is 0x1119e271e7 client must try another server
2019-12-01 21:46:36,631 [myid:161] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@841] - Refusing session request for client /192.168.164.94:32532 as it has seen zxid 0x1119e274a5 our last zxid is 0x1119e271e7 client must try another server
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
14 weeks, 6 days ago 0|z09czc:
ZooKeeper ZOOKEEPER-3638

Update Jetty to 9.4.24.v20191120

Improvement Closed Major Fixed Colm O hEigeartaigh Colm O hEigeartaigh Colm O hEigeartaigh 05/Dec/19 13:06   14/Feb/20 10:23 06/Jan/20 13:16   3.6.0, 3.5.7, 3.7.0     0 1 0 4800   Jetty should be updated to the latest version (9.4.24.v20191120) to pick up some CVE fixes. 100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks, 3 days ago 0|z09c9c:
ZooKeeper ZOOKEEPER-3637

Fix haveDelivered wrong implementation

Bug Open Minor Unresolved Unassigned maoling maoling 05/Dec/19 05:58   28/Dec/19 09:48       leaderElection, server   0 2 0 1200   100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
15 weeks ago 0|z09bmg:
ZooKeeper ZOOKEEPER-3636

find back the missing configuration property in the zookeeperAdmin page when moving from xml to markdown

Improvement Resolved Minor Fixed maoling maoling maoling 01/Dec/19 04:41   17/Dec/19 14:13 17/Dec/19 14:13 3.6.0 3.6.0 documentation   0 1 0 3000   100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 2 days ago 0|z0960w:
ZooKeeper ZOOKEEPER-3635

Use Docker and Maven Release Plugin to prepare ZooKeeper releases

Task In Progress Major Unresolved Enrico Olivelli Enrico Olivelli Enrico Olivelli 29/Nov/19 18:26   14/Dec/19 06:06   3.6.0 3.7.0 build   0 2 0 11400   In 3.5.5 and 3.5.6 we followed a new release procedure based on Maven:

[https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToRelease+using+maven]

That procedure needed some "manual" parts to change the project version inside pom files and also inside the sources of the C Client.

We can automate more and more in order to make the release procedure mostly automatic.

We should also use 'docker' in order to have a reproducible build environment, expectially for the 'convenience binaries':
* Java version (we want to build the project with Java 8 in 3.6.0)
* C client (tools, system headers and openssl version)

 
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
15 weeks, 6 days ago 0|z095lc:
ZooKeeper ZOOKEEPER-3634

why zookeeper huge snapshot cause waitEpockAck timeout?

Bug Open Blocker Unresolved Unassigned zechao zheng zechao zheng 29/Nov/19 02:52   20/Dec/19 02:41   3.4.5       0 2   h4. Question

After a large number of znodes are created, ZooKeeper servers in the ZooKeeper cluster become faulty and cannot be automatically recovered or restarted.

Logs of the followe:
2016-06-23 08:00:18,763 | WARN | QuorumPeer[myid=26](plain=/10.16.9.138:24002)(secure=disabled) | Exception when following the leader | org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:156)
at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:276)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1094)2016-06-23 08:00:18,763 | WARN | QuorumPeer[myid=26](plain=/10.16.9.138:24002)(secure=disabled) | Exception when following the leader | org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:156)
at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:276)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1094)
Logs of the leader:
016-06-23 07:30:57,481 | WARN | QuorumPeer[myid=25](plain=/10.16.9.136:24002)(secure=disabled) | Unexpected exception | org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108)
java.lang.InterruptedException: Timeout while waiting for epoch to be acked by quorum
at org.apache.zookeeper.server.quorum.Leader.waitForEpochAck(Leader.java:1221)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:487)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1105)
2016-06-23 07:30:57,482 | INFO | QuorumPeer[myid=25](plain=/10.16.9.136:24002)(secure=disabled) | Shutdown called | org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:623)
java.lang.Exception: shutdown Leader! reason: Forcing shutdown
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:623)
at org.apache.zookeeper.server.quorum.QuorumPeer.stopLeader(QuorumPeer.java:1149)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1110)
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 6 days ago 0|z094o0:
ZooKeeper ZOOKEEPER-3633

AdminServer commands throw NPE when only secure client port is used

Bug Closed Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 28/Nov/19 11:08   14/Feb/20 10:23 03/Dec/19 04:02 3.5.5, 3.5.6 3.6.0, 3.5.7     0 1 0 5400   *thanks for Mike Smotritsky for reporting this bug!*

when only secureClientPort is defined in the config and there is no regular clientPort, then both the {{stat}} and the {{conf}} commands result in 500 Server Error caused by NullPointerExceptions. The problem is that no {{serverCnxFactory}} is defined in the {{ZooKeeperServer}} in this case, we have only {{secureServerCnxnFactory}}.

see the attached stacktraces about the exceptions (reproduced on the current master branch)

The stat and conf admin commands should actually provide info about both secure and unsecure connections, and should handle the case when any of these are missing.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
16 weeks ago 0|z0948o:
ZooKeeper ZOOKEEPER-3632

Create Maven-base Jenkins job to verify trunk with Java 13

Task Open Major Unresolved Unassigned Andor Molnar Andor Molnar 25/Nov/19 03:46   25/Nov/19 03:46       build   0 1   Replacing: [https://builds.apache.org/view/ZK%20All/job/ZooKeeper-trunk-java13/] 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 3 days ago 0|z08yw8:
ZooKeeper ZOOKEEPER-3631

Create Maven-base Jenkins jobs for Windows CMake build

Task Open Major Unresolved Unassigned Andor Molnar Andor Molnar 25/Nov/19 03:45   25/Nov/19 03:45       build   0 1   Create a new Jenkins jobs on trunk to run Maven Windows CMake build.

Replacing: https://builds.apache.org/view/ZK%20All/job/ZooKeeper-trunk-windows-cmake/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 3 days ago 0|z08yw0:
ZooKeeper ZOOKEEPER-3630

Autodetection of SSL library during Zookeeper C client build

Improvement Resolved Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 25/Nov/19 03:40   15/Dec/19 08:17 14/Dec/19 03:31 3.5.6 3.6.0 c client   0 2 0 11400   After submitting [https://github.com/apache/zookeeper/pull/1107] about SSL support in ZooKeeper C client, [~ztzg] shared some very good improvement ideas, so we will now:
- use the {{--with-openssl}} autoconf argument in the same way as SASL is doing (i.e. by default we autodetect the openssl library, but with the option we can turn off SSL or specify a custom location for openssl lib)
- we will add and document a custom maven parameter that will be used during the c-client build
- we will also try to make the same logics during the windows build (cmake)
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 5 days ago 0|z08yvk:
ZooKeeper ZOOKEEPER-3629

add a new metric to detect the clock skew

Improvement Open Major Unresolved maoling maoling maoling 23/Nov/19 00:42   14/Dec/19 06:09     3.7.0 metric system, server   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 5 days ago 0|z08xt4:
ZooKeeper ZOOKEEPER-3628

Build failing on branch-3.4 with Java 8

Bug Open Major Unresolved Unassigned Andor Molnar Andor Molnar 22/Nov/19 05:27   22/Nov/19 05:27   3.4.14   server   0 1   branch34_openjdk8 has been failing since build #504. No patch has been submitted at that time, so it must be some infra problem.

The following job has failed:

[https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk8/504/]

with the following error message:
{noformat}
javax.security.sasl.SaslException: Failed to initialize authentication mechanism using SASL [Caused by javax.security.auth.login.LoginException: SASL-authentication failed because the specified JAAS configuration section 'QuorumLearnerInvalid' could not be found.]
at org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthLearner.<init>(SaslQuorumAuthLearner.java:72)
at org.apache.zookeeper.server.quorum.QuorumCnxManagerTest.createAndStartManager(QuorumCnxManagerTest.java:739)
at org.apache.zookeeper.server.quorum.QuorumCnxManagerTest.createAndStartManager(QuorumCnxManagerTest.java:728)
at org.apache.zookeeper.server.quorum.QuorumCnxManagerTest.testAuthLearnerBadCredToAuthRequiredServerWithHigherSid(QuorumCnxManagerTest.java:286)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)
at java.util.concurrent.FutureTask.run(java.base@9-internal/FutureTask.java:266)
at java.lang.Thread.run(java.base@9-internal/Thread.java:804)
Caused by: javax.security.auth.login.LoginException: SASL-authentication failed because the specified JAAS configuration section 'QuorumLearnerInvalid' could not be found.
at org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthLearner.<init>(SaslQuorumAuthLearner.java:63){noformat}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 6 days ago 0|z08wvk:
ZooKeeper ZOOKEEPER-3627

Update Jackson to 2.9.10.1 and the Owasp plugin to 5.2.4

Improvement Closed Major Fixed Colm O hEigeartaigh Colm O hEigeartaigh Colm O hEigeartaigh 21/Nov/19 06:29   14/Feb/20 10:23 17/Dec/19 07:57   3.5.7     0 1 0 9000   Jackson databind should be updated to 2.9.10.1 to pick up a CVE fix. This task also includes updating the owasp maven plugin to the latest version, which helps to detect CVE issues. 100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 2 days ago 0|z08va8:
ZooKeeper ZOOKEEPER-3626

API Docs not available

Wish Open Major Unresolved Unassigned tianweijiang tianweijiang 21/Nov/19 01:06   20/Feb/20 21:05           0 2   [http://zookeeper.apache.org/doc/current/api/index.html]
h1. Not Found

The requested URL was not found on this server.

 

i can't get any detail for command option like?? delquota [-n|-b] path??   -n?-b?.

[http://zookeeper.apache.org/doc/current/index.html]
* [API Docs|http://zookeeper.apache.org/doc/current/api/index.html] - the technical reference to ZooKeeper Client APIs

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 6 days ago 0|z08utk:
ZooKeeper ZOOKEEPER-3625

Add Automatic-Module-Name to MANIFEST.MF

Improvement Open Major Unresolved Unassigned Michael Miller Michael Miller 20/Nov/19 15:38   21/Nov/19 11:39           0 3   Add Automatic-Module-Name to the project jars in support of the Java 9 module system.  This can be done using the maven-jar-plugin: 
{code:html}
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifestEntries>
<Automatic-Module-Name>org.apache.zookeeper</Automatic-Module-Name>
</manifestEntries>
</archive>
</configuration>
</plugin>
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks ago 0|z08uf4:
ZooKeeper ZOOKEEPER-3624

testFailedTxnAsPartOfQuorumLoss is flaky again

Improvement Open Major Unresolved Unassigned Mate Szalay-Beko Mate Szalay-Beko 20/Nov/19 02:56   20/Nov/19 02:56           0 1   Yesterday the Maven PreCommit job failed for me for two unrelated PRs. Both cases the failing test case was {{org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testFailedTxnAsPartOfQuorumLoss}}

Re-running the precommit job helped, so I think the test is flaky.

 
{code:java}
java.lang.AssertionError: create /zk2 should have failed
at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testFailedTxnAsPartOfQuorumLoss(QuorumPeerMainTest.java:822){code}

see the build logs here: [https://pastebin.com/9h6MF1Sh]

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 1 day ago 0|z08tj4:
ZooKeeper ZOOKEEPER-3623

Build failure due to flaky tests on ppc64le

Bug Open Major Unresolved Unassigned Siddhesh Ghadi Siddhesh Ghadi 20/Nov/19 02:26   20/Nov/19 13:23   3.5.5   server   0 2   travis:
os: linux
arch: ppc64le

local:
os: rhel 7.6
arch: ppc64le
Build fails on travis as well as on local env for ppc64le due to some flaky tests. Very few times build passes. Also when the failing tests are ran individually they pass.

 

Travis logs:

Failed build: [https://api.travis-ci.org/v3/job/613931066/log.txt]

Passed build: [https://api.travis-ci.org/v3/job/614361405/log.txt]
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
17 weeks, 1 day ago 0|z08ti0:
ZooKeeper ZOOKEEPER-3622

ZooKeeper 3.5.6 Quorum TLS protocol issues

Bug Open Minor Unresolved Unassigned Kelly Schoenhofen Kelly Schoenhofen 16/Nov/19 21:38   17/Nov/19 20:39   3.5.6   server   0 3   Using 3.5.6 I have quorum tls working, but I'm being asked to tighten up from the default of AES128 & TLS 1.2, I've tried the following in the zoo.cfg:

ssl.quorum.protocol=TLSv1.3

This is apparently not supported yet - is this dependent on the version of openssl on the system, or is this just not an option I can specify? Where can I find the list of protocols that are recognized? If 1.3 is not yet available, not the end of the world.

-ssl.quorum.ciphersuites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384-

-This is not a recognized cipher, neither is AES256/SHA256. The above cipher _should_ be available though, and is the stronger successor to AES128/SHA256.-

-I have the suspicion that I'm setting it wrong, because if I set it to the cipher it defaults to when unset:-

-ssl.quorum.ciphersuites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256-

-Gives me this when cluster members try to connect:-

-2019-11-16 19:39:33,731 [myid:1] - INFO [xxx/x.x.x.x:3888:UnifiedServerSocket$UnifiedSocket@273] - Accepted TLS connection from xxx/x.x.x.x:40822 - NONE - SSL_NULL_WITH_NULL_NULL-
-2019-11-16 19:39:33,732 [myid:1] - WARN [xxx/x.x.x.x:3888:QuorumCnxManager@542] - Exception reading or writing challenge: {}-

 

(the only alteration I made to the above snippet is changing the machine names to xxx and ip's to x.x.x.x, I altered it in no other way)

So two questions:

1) is tls 1.3 an option?

2) what is the cipher list? I would like an aes256 option. 

Update: So I removed all my changes and I kept getting the the SSL_NULL_WITH_NULL_NULL error. I tore everything down, put it all back together, and still got SSL_NULL. Started again with just the first two nodes, very slowly, picking over the log, and then I noticed the initial error that the name in the cert didn't match the name of the server. When I set up the reverse lookup zone in DNS on Friday, I had set the FQDN properly, but over the weekend (while zk hummed along fine) the zone populated and overwrote everything with just the machine names, removing the FQDN. Hence the name not matching. 
I manually added the FQDN to the entries, rebooted the servers, and they started working.
Since I was getting SSL_NULL when I got off of trying TLSv1.3 and just trying AES256-SHA384, I tried that again, and it works fine:

2019-11-16 22:20:09,346 [myid:2] - INFO [LearnerHandler-/x.x.x.x:43548:UnifiedServerSocket$UnifiedSocket@273] - Accepted TLS connection from xxx/x.x.x.x:43548 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

So, this is less of a bug and more of a request - is TLS 1.3 an option, and how can I get a cipher list? I have AES256-SHA384 so that's acceptable to the SecOps where I work. 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 4 days ago 0|z08pls:
ZooKeeper ZOOKEEPER-3621

fix the potential bug in the process of clean-up of session when clock skew

Bug Open Major Unresolved maoling maoling maoling 16/Nov/19 02:46   16/Nov/19 02:46   3.6.0   server   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 5 days ago 0|z08oww:
ZooKeeper ZOOKEEPER-3620

Allow to override calls to System.exit in server side code

Improvement Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 15/Nov/19 15:08   28/Nov/19 06:27 28/Nov/19 06:24 3.6.0 3.6.0 server, tests   0 2 0 6600   Using System.exit crashed the JVM and this is very annoying for:
* ZooKeeper own server side tests
* Tests of downstream applications

We should provide a way to provide an alternative implementation of System.exit.

 

We can also re-enable the 'DM_EXIT' spotbugs rule, that prevents code from accidentally calling System.exit
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks ago 0|z08okg:
ZooKeeper ZOOKEEPER-3619

Implement server side semaphore API to improve the efficiency and throughput of coordination

New Feature Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 15/Nov/19 13:20   20/Dec/19 02:49   3.6.0 3.7.0 server   0 3   The design principle of ZK API is simple, flexible and general, it can meets different scenarios from coordination, health member track, meta store, etc. 

But there are some cost of this general design, which makes heavy and inefficient client code for recipes like distributed and semaphore, etc.

Currently, the general client side semaphore implementation without waiting time are:
# client A create sequential and ephemeral node N-1
# client B create sequential and ephemeral node N-2
# client A and B query all children and see if its holding the lock node with the smallest sequential id 
# since client A has smaller sequential id, its the semaphore owner (assume semaphore value is 1)
# client B will delete the node, close the session, and probably try again later from step 2

All the contenders will issue 4 write (create session, create lock, delete lock, close session) and 1 read (get children), which are pretty heavy and not scale well.

We actually hit this issue internally for one heavy semaphore use case, and we have to create dozens of ensembles to support their traffic.

To make the semaphore recipe more efficient, we can move the semaphore implementation to server side, where leader has all the context about who'll win the semaphore/lock during txn preparation time, do short circuit and fail the contender directly without proposing and committing those create/delete lock transactions.

To implement this, we need to add new semaphore API, which suppose to replace client side lock, leader election (semaphore value 1), and general semaphore use cases.

We started to design and implement it recently, it will based on another big improvement we've almost finished and will soon upstream it in ZOOKEEPER-3594 to skip proposing requests with error transactions.

Meanwhile, we'd like to hear some early feedback from the community about this feature.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 6 days ago 0|z08o34:
ZooKeeper ZOOKEEPER-3618

Send batch quorum Ack and Commit packets to improve the efficiency and throughput of ZK

Improvement Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 15/Nov/19 12:42   14/Dec/19 06:06   3.6.0 3.7.0 server   0 3   ZK guarantees that the txns will be flushed to disk in order, and we're doing batch flush to improve the disk IO efficiency and throughput, but when sending ACK back its still sending one by one, which is not efficient, instead we can send the ACK for the last flushed txn to leader in batch mode.

On leader, when it's receiving the ACK for txn N, based on the flushing order guarantees, all the txns before N have been flushed to disk as well, so they're all ACKed. The leader can then maintain the (SID -> last ACKed ZXID) map to calculate the latest COMMIT ZXID, and send that to all learners.

Based on the ordering guarantee, when learner received COMMIT for txn N, it means all the txns before that have been committed.

The main benefit we can get from this feature is to reduce the memory pressure, GC, quorum communication effort on all servers, and reduce the lock contention on leader when processing ACK, Commit, etc.

Overall, this will improve the efficiency of ZK, and expect to support higher throughput for write traffic.

To main challenge of this work is making sure backward compatible and also safe for gradually rollout, meanwhile make sure it won't affect the correctness/durability for txns during dynamic reconfig.

 

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 6 days ago 0|z08o28:
ZooKeeper ZOOKEEPER-3617

ZK digest ACL permissions gets overridden

Bug Open Major Unresolved maoling Vrinda Davda Vrinda Davda 15/Nov/19 00:28   29/Feb/20 03:49   3.4.9, 3.5.5   security, server   0 3   I was able to add one user with /crdwa/ access.
The moment I add another user with read-only access- /r/. The first user - /user1/
gets overridden with read-only access. Please see the output below :

 
{code:java}
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]  addauth digest user1:password1
[zk: localhost:2181(CONNECTED) 1] setAcl /newznode auth:user1:password1:crwad
cZxid = 0xe
ctime = Thu Nov 07 13:29:43 IST 2019
mZxid = 0xe
mtime = Thu Nov 07 13:29:43 IST 2019
pZxid = 0xe
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 8
numChildren = 0
[zk: localhost:2181(CONNECTED) 2] getAcl /newznode
'digest,'user1:XDkd2dsEuhc9ImU3q8pa8UOdtpI=
: cdrwa
[zk: localhost:2181(CONNECTED) 3] addauth digest user2:password2
[zk: localhost:2181(CONNECTED) 4] setAcl /newznode auth:user2:password2:r
cZxid = 0xe
ctime = Thu Nov 07 13:29:43 IST 2019
mZxid = 0xe
mtime = Thu Nov 07 13:29:43 IST 2019
pZxid = 0xe
cversion = 0
dataVersion = 0
aclVersion = 2
ephemeralOwner = 0x0
dataLength = 8
numChildren = 0
zk: localhost:2181(CONNECTED) 5] getAcl /newznode
'digest,'user1:XDkd2dsEuhc9ImU3q8pa8UOdtpI=
: r
'digest,'user2:lo/iTtNMP+gEZlpUNaCqLYO3i5U=
: r
{code}
 

If setAcl for both the users at the same time. I get both users duplicated, one with readonly and another with cdrwa permissions, as below:

 
{code:java}
[zk: localhost:2181(CONNECTED) 1] getAcl /zk_test
'world,'anyone
: cdrwa
[zk: localhost:2181(CONNECTED) 2]  addauth digest user1:password1
[zk: localhost:2181(CONNECTED) 3] addauth digest user2:password2
[zk: localhost:2181(CONNECTED) 5]
setAcl /zk_test auth:user2:password2:r,auth:user1:password1:cdrwa  
cZxid = 0x2
ctime = Wed Nov 13 20:14:08 IST 2019
mZxid = 0x2
mtime = Wed Nov 13 20:14:08 IST 2019
pZxid = 0x2
cversion = 0
dataVersion = 0
aclVersion = 2
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0
[zk: localhost:2181(CONNECTED) 7] getAcl /zk_test
'digest,'user1:XDkd2dsEuhc9ImU3q8pa8UOdtpI=
: r
'digest,'user2:lo/iTtNMP+gEZlpUNaCqLYO3i5U=
: r
'digest,'user1:XDkd2dsEuhc9ImU3q8pa8UOdtpI=
: cdrwa
'digest,'user2:lo/iTtNMP+gEZlpUNaCqLYO3i5U=
: cdrwa
{code}
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 5 days ago 0|z08n1s:
ZooKeeper ZOOKEEPER-3616

ZOOKEEPER-3282 add a new documentation: zookeeperCodingGuide.md

Sub-task In Progress Minor Unresolved maoling maoling maoling 14/Nov/19 05:54 19/Mar/20 11:53 17/Nov/19 23:01       documentation   0 1 0 600   100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
18 weeks ago 0|z08lug:
ZooKeeper ZOOKEEPER-3615

write a TLA+ specification to verify Zab protocol

Wish Open Major Unresolved maoling maoling maoling 13/Nov/19 20:48   13/Nov/19 20:49       documentation, server   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
18 weeks ago 0|z08l6o:
ZooKeeper ZOOKEEPER-3614

Limiting the number of ephemeral nodes per session

New Feature Open Major Unresolved Unassigned Eric Lee Eric Lee 13/Nov/19 20:17   14/Feb/20 09:51           0 1 0 13800   ZooKeeper suffers when a session has too many ephemeral associated with it. In the case where the session expires, the session expiration message passed amongst the quorum is too large from all of the ephemeral paths within it. This Jira introduces a change that allows a limit to be provided as a JVM flag. 100% 100% 13800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
18 weeks ago 0|z08l5s:
ZooKeeper ZOOKEEPER-3613

ZKConfig fails to return proper value on getBoolean() when user accidentally includes spaces at the end of the value

Bug Closed Minor Fixed Sujith Simon Scott Guminy Scott Guminy 12/Nov/19 13:14   14/Feb/20 10:23 20/Jan/20 03:10 3.5.5 3.6.0, 3.5.7, 3.7.0 server   0 3 0 4200   I was using ZooKeeper client in WebSphere Liberty and attempting to configure SSL/TLS for client connections.

To do so, I must add the system property {{zookeeper.client.secure=true}}.  In WebSphere Liberty, java system properties are placed in a file called bootstrap.properties - each property on a separate line.  I accidentally added a space at the end of the line.  When {{ZKConfig.getBoolean()}} attempted to convert this string to a {{boolean}}, it returned {{false}} due to the space at the end.

{{ZKConfig.getBoolean()}} should trim the string before attempting to convert to a boolean.
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 3 days ago 0|z08iug:
ZooKeeper ZOOKEEPER-3612

CLONE - Update lib prototype.js: 1.4.0_pre4 due to security vulnerability

Bug Patch Available Major Unresolved Unassigned Kevin Moultry Kevin Moultry 12/Nov/19 07:10   20/Dec/19 00:35       documentation   0 2   The zookeeper package package in zookeeper-docs\skin the lib prototype.js in version 1.4.0_pre4. There is a known security vulnerability CVE-2008-7220. Can you please upgrade to 1.6.0.2 or higher. Thanks. 1 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
18 weeks, 1 day ago 1
Incompatible change
1 0|z08i6o:
ZooKeeper ZOOKEEPER-3611

addWatch api supports all watch event type, strengthen the corresponding CLI and add a documentation

Improvement In Progress Major Unresolved maoling maoling maoling 11/Nov/19 05:54   17/Dec/19 04:37           0 2 0 11400   100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
18 weeks, 2 days ago 0|z08gpc:
ZooKeeper ZOOKEEPER-3610

Update lib prototype.js: 1.4.0_pre4 due to security vulnerability

Bug Open Major Unresolved Unassigned DW DW 07/Nov/19 12:20   13/Nov/19 03:16   3.4.14   documentation   0 1   The zookeeper package package in zookeeper-docs\skin the lib prototype.js in version 1.4.0_pre4. There is a known security vulnerability CVE-2008-7220. Can you please upgrade to 1.6.0.2 or higher. Thanks. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks ago 0|z08d60:
ZooKeeper ZOOKEEPER-3609

Update lib yui-min: 3.1.0 due to security vulnerability

Bug Open Major Unresolved Unassigned DW DW 07/Nov/19 12:13   07/Nov/19 12:18   3.4.14   contrib   0 1   The zookeeper package package in zookeeper-contrib\zookeeper-contrib-loggraph\src\resources\webapp\org\apache\zookeeper\graph\resources the lib yui-min.js in version 3.1.0. There is a known security vulnerability CVE-2013-4939. Can you please upgrade to higher version. Thanks. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks ago 0|z08d5c:
ZooKeeper ZOOKEEPER-3608

add a documentation about currentEpoch and acceptEpoch

Improvement Open Minor Unresolved maoling maoling maoling 07/Nov/19 05:33   17/Nov/19 05:10       documentation, leaderElection, server   0 2   Users may confuse about these two variables:*acceptedEpoch and currentEpoch* introduced by this ticket.

The implementation up to version 3.3.3 has not included epoch variables *acceptedEpoch and currentEpoch*. This omission has generated problems in a production version and was noticed by many ZooKeeper clients.

− *acceptedEpoch*: the epoch number of the last *NEWEPOCH* message accepted;
− *currentEpoch*: the epoch number of the last *NEWLEADER* message accepted;

The origin of this problem is at the beginning of *Recovery* Phase, when the leader increments its epoch (contained in *lastZxid*) even before acquiring a quorum of successfully connected followers (such leader is called *false leader*). Since a follower goes back to *FLE* if its epoch is larger than the leader’s epoch, when a *false leader* drops leadership and becomes a follower of a leader from a previous epoch, it finds a smaller epoch and goes back to FLE. This behavior can loop, switching from *Recovery* Phase to *FLE* repeatedly.
Consequently, using *lastZxid* to store the epoch number, there is no distinction between a *tried* epoch and a *joined* epoch in the implementation. Those are the respective purposes for *acceptedEpoch and currentEpoch*, hence the omission of them render such problems.

More details can be found in this report paper: _*ZooKeeper’s atomic broadcast protocol: Theory and practice. Andr ́e Medeiros March 20, 2012*_
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks ago 0|z08cpk:
ZooKeeper ZOOKEEPER-3607

Potential data inconsistency due to the inconsistency between ZKDatabase.committedLog and dataTree in Trunc sync.

Bug Open Critical Unresolved Unassigned Jiafu Jiang Jiafu Jiang 06/Nov/19 02:33   13/Mar/20 02:44   3.4.14   quorum   1 3   I will describe the problem by a detail example.

1. Suppose we have three zk servers: zk1, zk2, and zk3. zk1 and zk2 are online, zk3 is offline, zk1 is the leader.

2. In TRUNC sync, zk1 sends a TRUNC request to zk2, then sends the remaining proposals in the committedLog. *When the follower zk2 receives the proposals, it applies them directly into the datatree, but not the committedLog.*

3. After the data sync phase, zk1 may continue to send zk2 more committed proposals, and they will be applied to both the datatree and the committedLog of zk2.

4. Then zk1 fails, zk3 restarts successfully, zk2 becomes the leader.

5. The leader zk2 sends a TRUNC request to zk3, then the remaining proposals from the committedLog. But since some proposals, which are from the leader zk1 in TRUNC sync(as I describe above), are not in the committedLog, they will not be sent to zk3.

6. Now data inconsistency happens between zk2 and zk3, since some data may exist in zk2's datatree, but not zk3's datatree.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 days ago 0|z08ax4:
ZooKeeper ZOOKEEPER-3606

add JMXHOSTNAME to zkServer.sh to enable user to change the exposed hostname of jmx service

Improvement Resolved Minor Fixed Unassigned Chia-Ping Tsai Chia-Ping Tsai 06/Nov/19 00:06   17/Dec/19 08:36 17/Dec/19 07:59   3.6.0     0 2 0 4800   the variable "JMXHOSTNAME" introduce a jmx argument "java.rmi.server.hostname" to zkServer.sh. the argument can change the address exposed by jmx service, and it is useful to the zk which is deployed in the container. 100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
13 weeks, 2 days ago 0|z08as0:
ZooKeeper ZOOKEEPER-3605

ZOOKEEPER-3242 add a connection throttle. Default constructor needs to set it

Bug Resolved Major Fixed Unassigned Jordan Zimmerman Jordan Zimmerman 02/Nov/19 10:18   06/Nov/19 14:09 06/Nov/19 09:21 3.6.0 3.6.0 server   0 2 0 2400   ZOOKEEPER-3242 add a connection throttle. It gets set in the main constructor but not the alternate constructor. This is breaking Apache Curator's testing framework. It should also be set in the alternate constructor to avoid an NPE. 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks, 1 day ago 0|z0874o:
ZooKeeper ZOOKEEPER-3604

Improve the treatment of path parameter for ZooKeeper.sync method

Improvement Open Minor Unresolved Unassigned Jingguo Yao Jingguo Yao 02/Nov/19 08:55   02/Nov/19 08:58       documentation, java client   0 1   ZooKeeper.sync method has a parameter called path. But path's meaning is not documented in Javadoc. And path is useless for sync operation. Even if we want to keep it for compatibility, we should document its meaning. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks, 5 days ago 0|z0873k:
ZooKeeper ZOOKEEPER-3603

Zookeeper (3.5.6) not starting on mac

Bug Open Major Unresolved Unassigned Suraj Kumar Agrahari Suraj Kumar Agrahari 01/Nov/19 10:06   18/Nov/19 00:26   3.5.6   build   0 2   Mac 10.14 I downloaded zookeeper 3.5.6 and was unable to start because the zookeeper_server.pid file was not being created.

The problem was in the line *ZOO_DATADIR="$(echo -e "${ZOO_DATADIR}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"*

 

After removing -e option from the echo command, it started successfully.
triaged 9223372036854775807 No Perforce job exists for this issue. 3 9223372036854775807
Important
17 weeks, 3 days ago 0|z086d4:
ZooKeeper ZOOKEEPER-3602

Add ability to toggle nagios checks in charm

Improvement Resolved Major Duplicate Unassigned John Losito John Losito 01/Nov/19 09:39   01/Nov/19 09:41 01/Nov/19 09:40         0 1   Currently, there isn't a way one can toggle certain nagios checks that the zookeeper charm creates when adding the following relation:
{code:java}
juju add-relation zookeeper:local-monitors npre:local-monitors
{code}
It would be nice if one could turn off or disable certain checks via the charm's configuration options. For instance, if one wanted to turn off the max latency check, one can do so using the following:
{code:java}
juju config zookeeper max_latency_check=false{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks, 6 days ago 0|z086aw:
ZooKeeper ZOOKEEPER-3601

introduce the fault injection framework: Byteman for ZooKeeper

New Feature Open Major Unresolved maoling maoling maoling 01/Nov/19 04:18   21/Jan/20 12:17     3.7.0 documentation, server   0 1 0 2400   [https://www.datastax.com/blog/2016/02/cassandra-unit-testing-byteman] 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks, 6 days ago 0|z08620:
ZooKeeper ZOOKEEPER-3600

support the complete Linearizability Read

New Feature In Progress Major Unresolved maoling maoling maoling 31/Oct/19 06:44   13/Feb/20 21:48     3.7.0 documentation, java client, server   0 1 0 600   100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
20 weeks ago 0|z084qg:
ZooKeeper ZOOKEEPER-3599

cli.c: Resuscitate "old-style" argument parsing

Improvement Resolved Minor Fixed Unassigned Damien Diederen Damien Diederen 31/Oct/19 05:01   11/Dec/19 10:07 11/Dec/19 10:06 3.6.0 3.6.0     0 1 0 10200   Patches adding functionality to the C client library often want to make these new capabilities available/explorable via the {{cli.c}} shell.

Examples include ZOOKEEPER-1112 (SASL; the original patchset included a duplicated {{cli_sasl.c}}) and ZOOKEEPER-2122 (SSL; the current PR switches it to use {{getopt}}).

This ticket is about adding /optional/ {{getopt}} support to {{cli.c}} without breaking existing uses, and would be a prerequisite for extensions which "require" new flags.
100% 100% 10200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 1 day ago 0|z084lc:
ZooKeeper ZOOKEEPER-3598

Fix potential data inconsistency issue due to CommitProcessor not gracefully shutdown

Bug Reopened Critical Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 29/Oct/19 15:23 19/Mar/20 11:48 18/Nov/19 07:41   3.6.0   server   0 3 0 13200    
There was a regression introduced after inline the write in CommitProcessor with changes intorduced in ZOOKEEPER-3359, which didn't wait the in-flight write to finish before calling shutdown on the nextProcessor.
 
So it's possible that CommitProcessor thread and QuorumPeer thread will update the DataTree concurrently if we're doing fastForwardDataBase at the end of ZooKeeperServer.shutdown, which will cause inconsistent issue.
 
This JIRA is going to make sure we wait on the CommitProcessor to shutdown gracefully before calling shutdown on next processor, and exit if we cannot finish it gracefully to avoid potential inconsistency.
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 6 days ago Committed to master branch as 79f99af81842f415b97e1c3c18c953df5bd129b2

(I had a problem with the script and JIRA)
0|z082ag:
ZooKeeper ZOOKEEPER-3597

Add ability to toggle nagios checks in charm

Improvement Resolved Major Invalid Unassigned John Losito John Losito 29/Oct/19 10:54   01/Nov/19 09:41 01/Nov/19 09:41         0 1   Currently, there isn't a way one can toggle certain nagios checks that the zookeeper charm creates when adding the following relation:
{code:java}
juju add-relation zookeeper:local-monitors npre:local-monitors
{code}
It would be nice if one could turn off or disable certain checks via the charm's configuration options. For instance, if one wanted to turn off the max latency check, one can do so using the following:
{code:java}
juju config zookeeper max_latency_check=false{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
19 weeks, 6 days ago 0|z081zk:
ZooKeeper ZOOKEEPER-3596

Add ability to change limits for nagios checks in charm

Improvement Resolved Major Invalid Unassigned John Losito John Losito 29/Oct/19 10:31   01/Nov/19 09:36 01/Nov/19 09:36         0 2   Currently there is no way for one to change the limits for any nagios checks in the zookeeper charm. It would be nice if users could modify these limits while setting the defaults  to whatever is currently hardcoded. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
20 weeks ago 0|z081yg:
ZooKeeper ZOOKEEPER-3595

Fsync parameter for serialize method is ingnored

Improvement Resolved Minor Fixed Jingguo Yao Jingguo Yao Jingguo Yao 29/Oct/19 02:49   03/Dec/19 13:37 03/Dec/19 13:13   3.6.0 server   0 2 0 1200   [ZOOKEEPER-2872: Interrupted snapshot sync causes data loss|https://github.com/apache/zookeeper/commit/0706b40afad079f19fe9f76c99bbb7ec69780dbd] introduced fsync parameter for serialize method. But this parameter is ignored in  [FileSnap.java|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileSnap.java#L232]. 100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
15 weeks, 2 days ago 0|z081eg:
ZooKeeper ZOOKEEPER-3594

Ability to skip proposing requests with error transactions

New Feature Open Major Unresolved Vladimir Ivić Vladimir Ivić Vladimir Ivić 27/Oct/19 20:31   22/Dec/19 11:52   3.5.6   quorum, server   0 4 0 12600   Ensembles that have a high write request rate could be skipping proposing requests that contain errors instead of having the quorum vote on those transactions.

For example, having a sizable write traffic that results in error transactions would be creating additional network and log disk space overhead for each of the requests that would only return errors to the client in the end (e.g. delete non-existing path, version mismatch, trying to create a node that already exists etc). 

Currently, there is no such logic in the ProposalRequestProcessor, every request that comes out of PrepRequestProcessor will be proposed to the quorum.  

Proposed solution workflow:
* A client sends a new write request trying to setData on an non-existing node
* The server accepts the request and sends it through the PrepRequestProcessor
* PrepRequestProcessor detects the error and assigns the error transaction to the request
* Between PrepRequestProcessor and ProposalRequestProcessor there is another processor named SkipRequestProcessor which sole responsibility is to decide will the request be forwarded or returned to the originating quorum peer (Leader or Learner).
* The quorum peer waits for all previous requests to complete before the error request proceeds with echoing the error back to the client.

Requirements: 
* We should be conservative about the use of ZXID. If the request generates an error transaction we don't want to increment the last proposed ZXID and cause any gaps in the log.
* Request that are found to be invalid should be sent directly to the origin (either to the corresponding Learner or to the Leader directly) so there is exactly one roundtrip for each request with error transaction.
* All the requests must preserve its order, the changes must be backwards compatible with the Zookeeper ordering guarantees. 

 Challenges:
* Skipping request without having them go through the proposal pipeline poses a challenge in preserving Zookeeper transaction order.
* Avoiding any additional changes inside CommitProcessor if possible.
* Have unified logic for all three different paths in which write requests can come through:
## Via Leader placed directly into the PrepRequestProcessor
## Via Follower where the request is forwarded to the leader and also passed to CommitProcessor to wait for COMMIT packet
## Via Observer, where the request is forwarded to the Leader and the Observer waits for the INFORM packet
100% 100% 12600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
14 weeks, 1 day ago 0|z07zuo:
ZooKeeper ZOOKEEPER-3593

fix the default value of jute.maxbuffer in client side and an optimization for the documentation

Improvement Resolved Minor Fixed maoling maoling maoling 27/Oct/19 01:36   12/Mar/20 02:15 03/Nov/19 20:03 3.6.0, 3.5.5, 3.5.6 3.6.0     0 3 0 3600    
{code:java}
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:java.io.tmpdir=/var/folders/kj/092gpj_s2hvdgx77c9ghqdv00000gp/T/
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:java.compiler=<NA>
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:os.name=Mac OS X
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:os.arch=x86_64
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:os.version=10.13.6
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:user.name=wenba
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:user.home=/Users/wenba
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:user.dir=/Users/wenba/workspaces/workspace_learning/YCSB
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:os.memory.free=234MB
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:os.memory.max=3641MB
2019-10-27 18:08:41 INFO ZooKeeper:109 - Client environment:os.memory.total=245MB
2019-10-27 18:08:41 INFO ZooKeeper:868 - Initiating client connection, connectString=127.0.0.1:2180/bm sessionTimeout=30000 watcher=site.ycsb.db.zookeeper.ZKClient$SimpleWatcher@4b506000
2019-10-27 18:08:41 INFO X509Util:79 - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2019-10-27 18:08:41 INFO ClientCnxnSocket:237 - jute.maxbuffer value is 4194304 Bytes
FAILED: Count=6, Max=106559, Min=99392, Avg=103157.33, 90=105791, 99=106559, 99.9=106559, 99.99=106559]FAILED: Count=6, Max=106559, Min=99392, Avg=103157.33, 90=105791, 99=106559, 99.9=106559, 99.99=106559]2019-10-27 18:08:51 INFO  ClientCnxn:1112 - Opening socket connection to server localhost/127.0.0.1:2180. Will not attempt to authenticate using SASL (unknown error)2019-10-27 18:08:51 INFO  ClientCnxn:959 - Socket connection established, initiating session, client: /127.0.0.1:51795, server: localhost/127.0.0.1:21802019-10-27 18:08:51 INFO  ClientCnxn:1394 - Session establishment complete on server localhost/127.0.0.1:2180, sessionid = 0x1000c6e0658000a, negotiated timeout = 300002019-10-27 18:08:51 WARN  ClientCnxn:1246 - Session 0x1000c6e0658000a for server localhost/127.0.0.1:2180, unexpected error, closing socket connection and attempting reconnectjava.io.IOException: Unreasonable length = 1073590 at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127) at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92) at org.apache.zookeeper.proto.GetDataResponse.deserialize(GetDataResponse.java:56) at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:919) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)2019-10-27 18:08:51 ERROR ZKClient:128 - Error when reading a path:/user5344789772525948776,tableName:usertableorg.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /user5344789772525948776
{code}
 

 

From the log I saw  *jute.maxbuffer*  in my client side is 4194304 Bytes, but I cannot read a znode which is almost 1MB
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 week ago 0|z07zdc:
ZooKeeper ZOOKEEPER-3592

cnxTimeout should be also configurable via server config

Improvement Resolved Minor Not A Problem Unassigned Andrew Kyle Purtell Andrew Kyle Purtell 23/Oct/19 14:30   24/Oct/19 17:26 24/Oct/19 17:26     server   0 3   Currently cnxTimeout must be set with a system property. A trivial usability improvement would be to also allow it to be managed with a new option in the server configuration file. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
21 weeks ago 0|z07w0w:
ZooKeeper ZOOKEEPER-3591

Inconsistent resync with dynamic reconfig

Bug Open Major Unresolved Unassigned Alex Mirgorodskiy Alex Mirgorodskiy 23/Oct/19 00:40   20/Dec/19 15:19   3.5.5   server   0 2   We've run into a problem where one of the zookeeper instances lost most of its data after its zk process has been restarted. We suspect an interaction between dynamic reconfiguration and snapshot-based resync of that instance. Details and some amateurish analysis are below. We can also upload transaction logs, if need be.

We have a 6-instance ensemble running version 3.5.5 with 3 quorum members and 3 observers. One of the observers (Instance 6) saw its db shrink from 3162 znodes down to 10 after that instance restarted:

> 2019-10-13T16:44:19.060+0000 [.zk-monitor-0] Monitor command mntr: zk_version 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3162
> --
> 2019-10-13T16:48:32.713+0000 [.zk-monitor-0] Monitor command mntr: zk_version 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 10

Contrast it with Instance 1 that was the leader at the time, and whose znode_count remained stable around 3000:

> 2019-10-13T16:44:48.625+0000 [.zk-monitor-0] Monitor command mntr: zk_version 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3178
> --
> ...
> --
> 2019-10-13T16:48:48.731+0000 [.zk-monitor-0] Monitor command mntr: zk_version 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3223

It appears that the problem had happened 30 minutes earlier, when Instance 6 got resynced from the leader via the Snap method, yet skipped creating an on-disk snapshot. The end result was that the in-memory state was fine, but there was only the primordial snapshot.0 on disk, and transaction logs only started after the missing snapshot:

$ ls -l version-2
> total 1766
> -rw-r--r-- 1 daautomation daautomation 1 Oct 13 09:14 acceptedEpoch
> -rw-r--r-- 1 daautomation daautomation 1 Oct 13 10:12 currentEpoch
> -rw-r--r-- 1 daautomation daautomation 2097168 Oct 13 09:44 log.6000002e0
> -rw-r--r-- 1 daautomation daautomation 1048592 Oct 13 10:09 log.600001f1b
> -rw-r--r-- 1 daautomation daautomation 4194320 Oct 13 12:16 log.600003310
> -rw-r--r-- 1 daautomation daautomation 770 Oct 13 09:14 snapshot.0

So the zk reboot wiped out most of the state.

Dynamic reconfig might be relevant here. Instance 6 started as an observer, got removed, and immediately re-added as a participant. Instance 2 went the other way, from participant to observer:

> 2019-10-13T16:14:19.323+0000 ZK reconfig: removing node 6
> 2019-10-13T16:14:19.359+0000 ZK reconfig: adding server.6=10.80.209.138:2888:3888:participant;0.0.0.0:2181
> 2019-10-13T16:14:19.399+0000 ZK reconfig: adding server.2=10.80.209.131:2888:3888:observer;0.0.0.0:2181

Looking at the logs, Instance 6 started and received a resync snapshot from the leader right before the dynamic reconfig:

> 2019-10-13T16:14:19.284+0000 [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Getting a snapshot from leader 0x6000002dd
> ...
> 2019-10-13T16:14:19.401+0000 [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Got zxid 0x6000002de expected 0x1

Had it processed the NEWLEADER packet afterwards, it would've persisted the snapshot locally. But there's no NEWLEADER message in the Instance 6 log. Instead, there's a "changes proposed in reconfig" exception, likely a result of the instance getting dynamically removed and re-added as a participant:

> 2019-10-13T16:14:19.467+0000 [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Becoming a non-voting participant
> 2019-10-13T16:14:19.467+0000 [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Exception when observing the leaderjava.lang.Exception: changes proposed in reconfig\n\tat org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:506)\n\tat org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:74)\n\tat org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1258)

Perhaps the NEWLEADER packet was still in the socket, but sitting behing INFORMANDACTIVATE, whose exception prevented us from processing NEWLEADER?

Also, it may or may not be related, but this area got changed recently as part of https://issues.apache.org/jira/browse/ZOOKEEPER-3104.
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
12 weeks, 6 days ago 0|z07uzs:
ZooKeeper ZOOKEEPER-3590

Zookeeper is unable to set the zookeeper.sasl.client.canonicalize.hostname using system variable

Bug Closed Minor Fixed Unassigned Aristotelhs Aristotelhs 22/Oct/19 13:59   14/Feb/20 10:23 22/Nov/19 17:43 3.5.6 3.6.0, 3.5.7 java client   0 1 0 3000   After some reworking on the Zookeeper Sasl implementation (https://issues.apache.org/jira/browse/ZOOKEEPER-3156) the knob of zookeeper.sasl.client.canonicalize.hostname was introduced in order to disable the host name canonicalization. However in ZKClientConfig in handleBackwardCompatibility() this option is not included, which I assume is due to omission. This creates an issue if the zookeeper library is hidden behind another library that does provide an interface to change this value.

Therefore it should also be set by the system properties like the rest of the variables.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 6 days ago 0|z07ujc:
ZooKeeper ZOOKEEPER-3589

3.4-branch has potential data inconsistency caused by ZOOKEEPER-3104

Bug Open Critical Unresolved Unassigned Pierre Yin Pierre Yin 22/Oct/19 03:53   28/Nov/19 16:35   3.4.13, 3.4.14   server   0 2 0 12600   ZOOKEEPER-3104 describes one critical data inconsistency risk.

The risk also exists in 3.4 branch.

In our 3.4.13 production cluster, the data inconsistency happens for many times.

After digging some transaction logs and snapshot, we believe that ZOOKEEPER-3104 is the main risk to contributes to our data inconsistency.

The risk probability maybe higher than we can consider in real product environment.  The serialization of big DataTree may leads to a big risk time window in the high write traffic situation. Any failure during the risk time window would cause the data inconsistency. 

The data inconsistency is almost unacceptable in zookeeper semantic.

This issue is already fixed in 3.6. But I think it is very necessary to backport ZOOKEEPER-3104 to branch-3.4, especially in the situation that the migration from 3.4 to 3.5 actually take more effort to evaluate the compatibility risk in real product environment.

I will have submit a github pull request to fix it. Can anyone help us to review it please ?

Many thanks.

 
100% 100% 12600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
21 weeks, 2 days ago https://github.com/apache/zookeeper/pull/1123 0|z07tsw:
ZooKeeper ZOOKEEPER-3588

BambuPoker - Situs Poker Uang Asli Indonesia Terbesar dan Terpopuler Tanpa Robot

Test Open Major Unresolved Unassigned Bambupoker Bambupoker 19/Oct/19 05:14   19/Oct/19 05:20   3.4.15 4.0.0 build-infrastructure, java client, server   0 1   bambupoker,bambu poker,daftar bambupoker,link alternatif bambupoker,situs poker online,capsa susun online,poker online terpercaya,daftar poker online,poker online terbaik,daftar judi poker,poker uang asli,daftar poker terpercaya,poker tanpa deposit,qq online poker,qq domino poker,poker capsa susun,judi online tanpa modal. *Situs Poker Uang Asli Indonesia Terbesar dan Terpopuler Tanpa Robot*
 
*[http://bambupoker.com/]*  ialah salah satu lapak permainan game kartu remi terbaik dan terpercaya di indonesia. Permainan di game Bambupoker dipastikan hanyalah *pemain vs pemain* tanpa ada robot di dalam meja tersebut, jadi tidak perlu ragu lagi untuk daftar di situs poker online dengan *[link alternatif BambuPoker|http://bambupoker.com/index.aspx]* yang aman.
 
Setelah selesai mendaftarkan data-data anda yang vaild, dan berikutnya anda bisa langsung *[Login BambuPoker|http://bambupoker.com/index.aspx]* melalui user id yang telah anda register tadi. Setelah berhasil masuk kedalam, berbagai game seperti *Bandar Ceme*, *Domino QQ*, *Poker Online*, *BandarQ*, *RajaQ*, dan *Super9* bisa dimainkan sesuai favorit anda. Sekarang Sudah Bisa Deposit Via OVO 
 
6 jenis permainan yang tersedia di *BambuPoker* perlu anda ketahui tidak perlu membuat anda harus terus memindahkan kredit, karena setelah anda melakukan *Deposit BambuPoker*, kredit bisa langsung dimainkan di semua permainan yang tersedia, inilah yang kami tawarkan kepada anda untuk terus memilih dan bergabung bersama BambuPoker sebagai *Website Poker Terbaik* di Indonesia.
 
*Proses Transaksi BambuPoker Mudah, Cepat dan Aman*
*BambuPoker* menerima transaksi melalui bank lokal yang sudah marak di gunakan oleh masyarakat Indonesia, seperti BCA, MANDIRI, BNI, dan BRI. Dan gak kala serunya, *BambuPoker* juga akan terus mengikuti trend indonesia, yang saat ini sedang di gemari oleh kalangan anak muda untuk melakukan setiap pembayaran tanpa menggunakan uang tunai yakni memaki fitur app *OVO BambuPoker*.
 
*Bonus Poker DominoQQ Online Terpercaya | BambuPoker*
Berbagai bonus yang di sediakan oleh *Bambupoker* seperti Bonus Refferal yang akan di terima setiap kali anda berhasil mendaftarkan teman atau orang lain dari *link refferal bambupoker* anda untuk bermain. Bonus Refferal diberikan tanpa batas, dan tak hanya satu promo yang akan diadakan BambuPoker, tapi bonus new member, bonus rollingan yang akan di terima setiap minggunya. dan ada juga bonus bulanan seperti Iphone Xs bahkan satu unit Honda PCX terbaru akan diberikan ketika anda mencapai level TO yang di tentukan. Hitungan TO akan di gabungkan dari semua permaian judi domino qq online.
 
*Customer Service Profesional dan Online 24 Jam Siap Melayani*
Setiap keluhan atau kritikkan bisa anda sampaikan kepada kami, kami Bambupoker dengan senang hati menerima setiap masukkan dari para member setiap Grup Bambu. Karena bisa melayani player-player untuk tidak merasa kecewa telah bermain di situs poker uang asli adalah suatu kehormatan besar Bambupoker.
 
*Download APK Poker Online di Android/IOS*
*Situs Bambupoker* bukan hanya bisa dimainkan lewat komputer saja, tapi untuk smartphone Android dan IOS juga bisa terakses dengan mudah. Karena bisa download apk poker online yang sudah kami sediahkan untuk member agar tidak terhalang untuk bermain bandar poker online terbaik dimanapun dan kapanpun, tanpa download dan tanpa robot.
 
*Segera Daftarkan Diri Anda, Raih Kemenangan dan AduQ di BambuPoker*
none 9223372036854775807 Sekarang Kami Juga Bisa Deposit Dan Wtihdrawal Dengan Menggunakan OVO dengan Proses yang Cepat , Mudah dan Aman.

WA : +62 823 6076 8385 (Nancy Sembiring)
Line : bambupoker
No Perforce job exists for this issue. 0 9223372036854775807
21 weeks, 5 days ago #bambupoker #LinkalternatifBambupoker #SitusPokerOnline #CapsaSusunOnline #PokerOnlineTerpercaya #DaftarPokerOnline #PokerOnlineTerbaik Bambupoker http://bambupoker.com/ 0|z07r74:
ZooKeeper ZOOKEEPER-3587

Add a documentation about docker

Improvement In Progress Minor Unresolved maoling maoling maoling 17/Oct/19 22:08   23/Nov/19 01:47       documentation   0 2 0 3000   A Follow-up documentation work: [https://github.com/apache/zookeeper/pull/1075] 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
21 weeks, 6 days ago 0|z07po0:
ZooKeeper ZOOKEEPER-3586

Write Log To Multiple Drives

Improvement Open Major Unresolved Unassigned David Mollitor David Mollitor 17/Oct/19 18:33   23/Dec/19 11:07       server   0 2   Allow ZooKeeper server to write the transaction log to multiple drives. I can imagine two different ways of doing this:

# Allow special namespace ZNodes under the root node. Upon creation, the user can specify the location of the log file for all activity under this node.
# Write each transaction out to more than one drive and return an ACK when any of the writes complete. Cancel any pending writes and delete the files that are furthest behind on merge.
# Write each transaction out to more than one drive and obtain a lock on a target drive before each write. If the lock for the first drive is taken, attempt to get the lock on the second drive, and so on, ... combine logs on merge being mindful that one of the transactions may have failed and created a small hole in the middle of the log.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 3 days ago 0|z07pj4:
ZooKeeper ZOOKEEPER-3585

Add a documentation about RequestProcessors

Improvement In Progress Major Unresolved maoling maoling maoling 17/Oct/19 06:23   17/Dec/19 08:00       documentation   0 1 0 3600   100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks ago 0|z07oo8:
ZooKeeper ZOOKEEPER-3584

"NoAuth" error message is ambiguous

Improvement Patch Available Trivial Unresolved Lars Francke Lars Francke Lars Francke 17/Oct/19 05:13   17/Oct/19 07:59   3.5.6       0 2   Currently we get a NoAuthException printed as "NoAuth"

 

Unfortunately "Auth" could mean "Authentication" or "Authorization" so I propose to change the error message to "Not authenticated"

I won't change the NoAuthException class name.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
22 weeks ago 0|z07ojc:
ZooKeeper ZOOKEEPER-3583

Add new apis to get node type and ttl time info

Improvement In Progress Major Unresolved maoling maoling maoling 15/Oct/19 22:05   01/Dec/19 23:35       other, scripts   1 1 0 2400    

 

stat -d to show more details:node type, ttl time info
{code:java}
[zk: 127.0.0.1:2180(CONNECTED) 15] stat /test
cZxid = 0xfa3c001b7ce4
ctime = Tue Oct 15 14:07:03 CST 2019
mZxid = 0xfa3c001b7d32
mtime = Tue Oct 15 16:52:28 CST 2019
pZxid = 0xfa3c001b7d33
cversion = 11
dataVersion = 42
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 6
numChildren = 11
{code}
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 1 day ago 0|z07mm0:
ZooKeeper ZOOKEEPER-3582

refactor the async api call to lambda style

Improvement Resolved Minor Fixed Unassigned maoling maoling 15/Oct/19 01:51   28/Dec/19 10:19 28/Dec/19 10:19   3.7.0 server   0 2 0 3000   For example:
{code:java}
if (recursive) {
ZKUtil.visitSubTreeDFS(zk, path, watch, new StringCallback() {
@Override
public void processResult(int rc, String path, Object ctx, String name) {
out.println(path);
}
});
}
{code}
refactor to 
{code:java}
ZKUtil.visitSubTreeDFS(zk, path, watch, (rc, path1, ctx, name) -> out.println(path1));
{code}
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 2 days ago 0|z07ld4:
ZooKeeper ZOOKEEPER-3581

use factory design pattern to refactor ZooKeeperMain

Improvement Open Minor Unresolved Unassigned maoling maoling 14/Oct/19 22:21   28/Feb/20 02:30           1 6   use factory design pattern to refactor ZooKeeperMain, make the code more elegant
{code:java}
static {
commandMap.put("connect", "host:port");
commandMap.put("history", "");
commandMap.put("redo", "cmdno");
commandMap.put("printwatches", "on|off");
commandMap.put("quit", "");

new CloseCommand().addToMap(commandMapCli);
new CreateCommand().addToMap(commandMapCli);
new DeleteCommand().addToMap(commandMapCli);
new DeleteAllCommand().addToMap(commandMapCli);
// Depricated: rmr
new DeleteAllCommand("rmr").addToMap(commandMapCli);
new SetCommand().addToMap(commandMapCli);
new GetCommand().addToMap(commandMapCli);
new LsCommand().addToMap(commandMapCli);
new Ls2Command().addToMap(commandMapCli);
new GetAclCommand().addToMap(commandMapCli);
new SetAclCommand().addToMap(commandMapCli);
new StatCommand().addToMap(commandMapCli);
new SyncCommand().addToMap(commandMapCli);
new SetQuotaCommand().addToMap(commandMapCli);
new ListQuotaCommand().addToMap(commandMapCli);
new DelQuotaCommand().addToMap(commandMapCli);
new AddAuthCommand().addToMap(commandMapCli);
new ReconfigCommand().addToMap(commandMapCli);
new GetConfigCommand().addToMap(commandMapCli);
new RemoveWatchesCommand().addToMap(commandMapCli);
new GetEphemeralsCommand().addToMap(commandMapCli);
new GetAllChildrenNumberCommand().addToMap(commandMapCli);
new VersionCommand().addToMap(commandMapCli);

// add all to commandMap
for (Entry<String, CliCommand> entry : commandMapCli.entrySet()) {
commandMap.put(entry.getKey(), entry.getValue().getOptionStr());
}
}
{code}
newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 4 days ago 0|z07l7s:
ZooKeeper ZOOKEEPER-3580

Maven Build error: Circular property definition

Bug Resolved Major Not A Problem Unassigned Javi Roman Javi Roman 14/Oct/19 13:27   20/Oct/19 13:07 20/Oct/19 13:07 3.5.5   server   0 1   Fresh download from release site:

cd apache-zookeeper-3.5.5

mv clean install
{code:java}
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache ZooKeeper 3.5.5 ............................. SUCCESS [ 2.854 s]
[INFO] Apache ZooKeeper - Documentation ................... SUCCESS [ 2.991 s]
[INFO] Apache ZooKeeper - Jute ............................ SUCCESS [ 9.815 s]
[INFO] Apache ZooKeeper - Server .......................... FAILURE [ 0.253 s]
[INFO] Apache ZooKeeper - Client .......................... SKIPPED
[INFO] Apache ZooKeeper - Recipes ......................... SKIPPED
[INFO] Apache ZooKeeper - Recipes - Election .............. SKIPPED
[INFO] Apache ZooKeeper - Recipes - Lock .................. SKIPPED
[INFO] Apache ZooKeeper - Recipes - Queue ................. SKIPPED
[INFO] Apache ZooKeeper - Assembly 3.5.5 .................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.172 s
[INFO] Finished at: 2019-10-14T19:23:04+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:properties-maven-plugin:1.0.0:read-project-properties (default) on project zookeeper: Circular property definition: env.BASH_FUNC__module_raw%%=() { unset _mlshdbg;
[ERROR] if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '1' ]; then
[ERROR] case "$-" in
[ERROR] *v*x*)
[ERROR] set +vx;
[ERROR] _mlshdbg='vx'
[ERROR] ;;
[ERROR] *v*)
[ERROR] set +v;
[ERROR] _mlshdbg='v'
[ERROR] ;;
[ERROR] *x*)
[ERROR] set +x;
[ERROR] _mlshdbg='x'
[ERROR] ;;
[ERROR] *)
[ERROR] _mlshdbg=''
[ERROR] ;;
[ERROR] esac;
[ERROR] fi;
[ERROR] unset _mlre _mlIFS;
[ERROR] if [ -n "${IFS+x}" ]; then
[ERROR] _mlIFS=$IFS;
[ERROR] fi;
[ERROR] IFS=' ';
[ERROR] for _mlv in ${MODULES_RUN_QUARANTINE:-};
[ERROR] do
[ERROR] if [ "${_mlv}" = "${_mlv##*[!A-Za-z0-9_]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
[ERROR] if [ -n "`eval 'echo ${'$_mlv'+x}'`" ]; then
[ERROR] _mlre="${_mlre:-}${_mlv}_modquar='`eval 'echo ${'$_mlv'}'`' ";
[ERROR] fi;
[ERROR] _mlrv="MODULES_RUNENV_${_mlv}";
[ERROR] _mlre="${_mlre:-}${_mlv}='`eval 'echo ${'$_mlrv':-}'`' ";
[ERROR] fi;
[ERROR] done;
[ERROR] if [ -n "${_mlre:-}" ]; then
[ERROR] eval `eval ${_mlre}/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash '"$@"'`;
[ERROR] else
[ERROR] eval `/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash "$@"`;
[ERROR] fi;
[ERROR] _mlstatus=$?;
[ERROR] if [ -n "${_mlIFS+x}" ]; then
[ERROR] IFS=$_mlIFS;
[ERROR] else
[ERROR] unset IFS;
[ERROR] fi;
[ERROR] unset _mlre _mlv _mlrv _mlIFS;
[ERROR] if [ -n "${_mlshdbg:-}" ]; then
[ERROR] set -$_mlshdbg;
[ERROR] fi;
[ERROR] unset _mlshdbg;
[ERROR] return $_mlstatus
[ERROR] } -> MODULES_SILENT_SHELL_DEBUG:-0=null -> IFS+x=null -> MODULES_RUN_QUARANTINE:-=null -> _mlv=null -> _mlv##*[!A-Za-z0-9_]=null -> _mlv=null
[ERROR] -> [Help 1]
{code}
mvn -version


Apache Maven 3.5.4 (Red Hat 3.5.4-5)
Maven home: /usr/share/maven
Java version: 1.8.0_222, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.fc30.x86_64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.2.18-200.fc30.x86_64", arch: "amd64", family: "unix"
build-problem 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 3 days ago Solved building with Ant:

$ ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean tar
[---]
BUILD SUCCESSFUL
Total time: 1 minute 14 seconds
0|z07kq8:
ZooKeeper ZOOKEEPER-3579

handle NPE gracefully when the watch parameter of zookeeper java client is null

Bug In Progress Minor Unresolved Zili Chen maoling maoling 14/Oct/19 06:47   07/Mar/20 05:29       java client   1 2 0 13200   When we use the native java client
{code:java}
try {
zk = new ZooKeeper(connectString, (int) sessionTimeout, null);
} catch (IOException e) {
throw new DBException("Creating connection failed.");
}
{code}
We will get the following, this issue had existed in all the zookeeper releases for a long time
{code:java}
2019-10-14 18:41:49 ERROR ClientCnxn:537 - Error while calling watcher2019-10-14 18:41:49 ERROR ClientCnxn:537 - Error while calling watcherjava.lang.NullPointerException at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)2019-10-14 18:41:50 ERROR ClientCnxn:537 - Error while calling watcherjava.lang.NullPointerException at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
{code}
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 2 days ago 0|z07keg:
ZooKeeper ZOOKEEPER-3578

Add a new CLI: multi

New Feature In Progress Major Unresolved maoling maoling maoling 14/Oct/19 06:35   14/Oct/19 22:49       scripts   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 2 days ago 0|z07kd4:
ZooKeeper ZOOKEEPER-3577

ZooKeeper Dynamic Reconfiguration is not support ssl

Bug Open Minor Unresolved Unassigned zhaoyan zhaoyan 14/Oct/19 06:27   14/Oct/19 06:31   3.5.5   server   0 2   ZooKeeper Dynamic Reconfiguration is not support ssl

 

{{server.1=125.23.63.23:2780:2783:participant;2791}}

 

{{2791}} is must plaintext port, it not support ssl port

 

reason:

org.apache.zookeeper.server.quorum.QuorumPeerConfig#setupClientPort

{{only {color:#9876aa}clientAddr{color}:}}

{color:#cc7832}if {color}(qs != {color:#cc7832}null {color}&& qs.{color:#9876aa}clientAddr {color}!= {color:#cc7832}null{color}) {color:#9876aa}clientPortAddress {color}= qs.{color:#9876aa}clientAddr{color}{color:#cc7832};{color}

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 3 days ago 0|z07kcw:
ZooKeeper ZOOKEEPER-3576

Zookeeper Fails with AUTH_FAILED state with SASL

Bug Open Major Unresolved Unassigned Ahshan Ahshan 14/Oct/19 02:02   14/Oct/19 03:44   3.4.10   kerberos, security   0 1   Although i'm able to authenticate successfully with the kerberoes account *"zookeeper/kafka-d1.eng.company.com@COMPANY.COM" , i still happen to encounter*  AUTH_FAILED during client Authentication

Following is the verification made from my end :
# Checked DNS ( Both Forward and Backward)

nslookup kafka-d1.eng.company.com
Server: 172.16.2.3
Address: 172.16.2.3#53

Name: kafka-d1.eng.company.com
Address: 10.14.61.17

Reverse DNS

nslookup 10.14.61.17
Server: 172.16.2.3
Address: 172.16.2.3#53

17.61.14.10.in-addr.arpa name = kafka-d1.eng.company.com.

 

2. Kerberoes Authentication

kinit -kt /etc/keytabs/zookeeper.keytab -V zookeeper/kafka-d1.eng.company.com
Using default cache: /tmp/krb5cc_0
Using principal: zookeeper/kafka-d1.eng.company.com@COMPANY.COM
Using keytab: /etc/keytabs/zookeeper.keytab
Authenticated to Kerberos v5

 

Below is the krb5 configuration File:

cat /etc/krb5.conf
[libdefaults]
default_realm = COMPANY.COM
dns_lookup_kdc = true
dns_lookup_realm = true
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts
default_tkt_enctypes = aes256-cts
permitted_enctypes = aes256-cts
udp_preference_limit = 1
kdc_timeout = 3000
ignore_acceptor_hostname = true
[realms]
COMPANY.COM =

{ kdc = srv-ussc-dc01e.company.com admin_server = srv-exxx.company.com kdc = srv-exxxe.company.com }

[domain_realm]
kafka-d1.eng.company.com = COMPANY.COM

 

export JVMFLAGS=-Djava.security.auth.login.config=/usr/share/zookeeper/conf/client_jaas.conf -Dsun.security.krb5.debug=true

 

cat /usr/share/zookeeper/conf/client_jaas.conf
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
debug=true
keyTab="/etc/keytabs/zookeeper.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/kafka-d1.eng.company.com@COMPANY.COM;
};

*Error Message :[^zoo.cfg][^zookeeper_server.log]*
{noformat}
./zkCli.sh -server kafka-d1.eng.company.com:2181
Connecting to kafka-d1.eng.company.com:2181
2019-10-14 02:08:16,625 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2019-10-14 02:08:16,628 [myid:] - INFO [main:Environment@100] - Client environment:host.name=kafka-d1.eng.company.com
2019-10-14 02:08:16,628 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_201
2019-10-14 02:08:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2019-10-14 02:08:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/opt/jdk1.8.0_201/jre
2019-10-14 02:08:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/share/zookeeper/bin/../build/classes:/usr/share/zookeeper/bin/../build/lib/*.jar:/usr/share/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/share/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/share/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/share/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/share/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/share/zookeeper/bin/../zookeeper-3.4.10.jar:/usr/share/zookeeper/bin/../src/java/lib/*.jar:/usr/share/zookeeper/bin/../conf:
2019-10-14 02:08:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-327.el7.x86_64
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root
2019-10-14 02:08:16,631 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/usr/share/zookeeper-3.4.10/bin
2019-10-14 02:08:16,632 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=kafka-d1.eng.company.com:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@306a30c7
Welcome to ZooKeeper!
JLine support is enabled
Debug is true storeKey true useTicketCache false useKeyTab true doNotPrompt false ticketCache is null isInitiator true KeyTab is /etc/keytabs/zookeeper.keytab refreshKrb5Config is false principal is zookeeper/kafka-d1.eng.company.com@COMPANY.COM tryFirstPass is false useFirstPass is false storePass is false clearPass is false
[zk: kafka-d1.eng.company.com:2181(CONNECTING) 0] principal is zookeeper/kafka-d1.eng.company.com@COMPANY.COM
Will use keytab
Commit Succeeded 2019-10-14 02:08:16,971 [myid:] - INFO [main-SendThread(kafka-d1.eng.company.com:2181):Login@295] - Client successfully logged in.
2019-10-14 02:08:16,973 [myid:] - INFO [Thread-1:Login$1@128] - TGT refresh thread started.
2019-10-14 02:08:16,975 [myid:] - INFO [Thread-1:Login@303] - TGT valid starting at: Mon Oct 14 02:08:16 EDT 2019
2019-10-14 02:08:16,976 [myid:] - INFO [Thread-1:Login@304] - TGT expires: Mon Oct 14 12:08:16 EDT 2019
2019-10-14 02:08:16,976 [myid:] - INFO [Thread-1:Login$1@183] - TGT refresh sleeping until: Mon Oct 14 10:08:57 EDT 2019
2019-10-14 02:08:16,977 [myid:] - INFO [main-SendThread(kafka-d1.eng.company.com:2181):SecurityUtils$1@124] - Client will use GSSAPI as SASL mechanism.
2019-10-14 02:08:16,988 [myid:] - INFO [main-SendThread(kafka-d1.eng.company.com:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server kafka-d1.eng.company.com/10.14.61.17:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2019-10-14 02:08:16,994 [myid:] - INFO [main-SendThread(kafka-d1.eng.company.com:2181):ClientCnxn$SendThread@876] - Socket connection established to kafka-d1.eng.company.com/10.14.61.17:2181, initiating session
2019-10-14 02:08:17,002 [myid:] - INFO [main-SendThread(kafka-d1.eng.company.com:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server kafka-d1.eng.company.com/10.14.61.17:2181, sessionid = 0x16dc8cbdb3b0002, negotiated timeout = 30000WATCHER::WatchedEvent state:SyncConnected type:None path:null
2019-10-14 02:08:17,024 [myid:] - ERROR [main-SendThread(kafka-d1.eng.company.com:2181):ZooKeeperSaslClient@247] - SASL authentication failed using login context 'Client'.WATCHER::WatchedEvent state:AuthFailed type:None path:null{noformat}
 

 
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
22 weeks, 3 days ago 0|z07k0o:
ZooKeeper ZOOKEEPER-3575

ZOOKEEPER-3573 Moving sending packets in Learner to a separate thread

Sub-task Resolved Major Fixed Unassigned Jie Huang Jie Huang 13/Oct/19 00:59   25/Jan/20 12:49 25/Jan/20 12:49 3.6.0 3.7.0 server   0 1 0 15000   After changing to close the socket asynchronously, the shutdown process can proceed while the socket is being closed. However, the shutdown process could still stall if a thread being shutdown is writing to the socket. For example, the SyncRequestProcessor flushes all ACK packets in queue when shutdown is called, which calls Learner.writePacket(), which will not return (with an IO exception) until the socket finishes closing. So it's still delayed by the socket closing time. 

To get around the delay, we move Learner.writePacket() to a separate thread. The tricky part is to handle the IO exception thrown by Learner.writePacket(). Currently, the IO exception is caught by different callers in different ways. For example, if an IO exception caught during revalidateSession, the session is closed and removed. In other cases, like in FollowerRequestProcessor and SendAckRequestProcess, the quorum socket is closed when the IO exception is caught. After moving it to a thread, the callers won't be able to catch and handle the exception. We need to handle it within the sending function. We reason that if an IO exception is thrown on the quorum socket of a follower, it only makes sense to shut down the server. So we make the sending thread a ZooKeeperCriticalThread.

 
100% 100% 15000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
7 weeks, 5 days ago 0|z07jfs:
ZooKeeper ZOOKEEPER-3574

ZOOKEEPER-3573 Close quorum socket asynchronously to avoid server shutdown stalled by long socket closing time

Sub-task Open Major Unresolved Unassigned Jie Huang Jie Huang 12/Oct/19 18:56   26/Feb/20 17:24   3.6.0   server   0 1 0 13200   Since we can't use SO_LINGER option or find a substitute to close a TLS socket quickly in JDK 11, we call close() asynchronously so the shutdown can proceed and a new leader election can be started while the socket being closed.  

 

 
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 5 days ago 0|z07je8:
ZooKeeper ZOOKEEPER-3573

Dealing with long TLS connection closing time without SO_LINGER option

Improvement Open Major Unresolved Unassigned Jie Huang Jie Huang 12/Oct/19 18:49   20/Dec/19 12:41   3.6.0   server   0 2   ZOOKEEPER-3574, ZOOKEEPER-3575 As described in ZOOKEEPER-3384, with SSL sockets, a close_notify is required to be sent before closing the write side of a connection. When the send buffer is full and the writing is blocked, it will take a long time to send close_notify thus a long time to close the socket. The long closing time on followers with a partitioned-away leader would stall the shutdown process and delay a new leader election to establish a new quorum. As a result, the ensemble would be unavailable for a long time.

In ZOOKEEPER-3384, SO_LINGER option is used to close the socket quickly (and potentially uncleanly). In JDK 11, however, SO_LINGER option is not honored so we need a new way to avoid the long quorum unavailable time.
100% 28200 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 6 days ago 0|z07je0:
ZooKeeper ZOOKEEPER-3572

correctly handle zooinspector resources

Improvement Open Minor Unresolved Unassigned qiang Liu qiang Liu 11/Oct/19 04:13   11/Oct/19 21:26   3.4.14   contrib   0 1 0 3600   windows7

intellij idea
zooinspector will start up with blank window when directly use IDE to run ZooInspector#main.

this is caused by icos resource load failed, by default icos are loaded from /usr/local/share and /usr/local/share and fail back to classpath.

because of parent pom config, resources are excluded and directory is not default, so resource directory icos are not copyed to target/class.
100% 100% 3600 0 development-mode, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 6 days ago 0|z07hvc:
ZooKeeper ZOOKEEPER-3571

Create test base directory on test started

Improvement Resolved Major Fixed Zili Chen Zili Chen Zili Chen 10/Oct/19 09:28   14/Nov/19 18:58 14/Nov/19 10:45   3.6.0 tests   0 2 0 6000   There are many, many times I fail tests because {{${build.test.dir}}} is not present. We can simply ensure the directory on test started. 100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
18 weeks ago 0|z07gwo:
ZooKeeper ZOOKEEPER-3570

make the special client xid constant

Improvement Resolved Minor Fixed maoling maoling maoling 08/Oct/19 03:31   21/Nov/19 00:14 20/Nov/19 23:59 3.6.0 3.6.0 server   0 2 0 1200   in the *ClientCnxn*, we had hard-code cxid which is not elegant.

we need a constant for cxid
{code:java}
if (replyHdr.getXid() == -2) {
// -2 is the xid for pings
if (LOG.isDebugEnabled()) {
LOG.debug("Got ping response for sessionid: 0x"
+ Long.toHexString(sessionId)
+ " after "
+ ((System.nanoTime() - lastPingSentNs) / 1000000)
+ "ms");
}
return;
}
if (replyHdr.getXid() == -4) {
// -4 is the xid for AuthPacket
if (replyHdr.getErr() == KeeperException.Code.AUTHFAILED.intValue()) {
state = States.AUTH_FAILED;
eventThread.queueEvent(new WatchedEvent(Watcher.Event.EventType.None, Watcher.Event.KeeperState.AuthFailed, null));
eventThread.queueEventOfDeath();
}
if (LOG.isDebugEnabled()) {
LOG.debug("Got auth sessionid:0x" + Long.toHexString(sessionId));
}
return;
}
if (replyHdr.getXid() == -1) {
{code}
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks ago 0|z07dps:
ZooKeeper ZOOKEEPER-3569

Compile error due to LOGSTREAM being null when passed to fprintf

Bug Open Major Unresolved Unassigned Ronald Fenner Ronald Fenner 07/Oct/19 19:46   08/Oct/19 01:03       c client   0 2   I'm trying to compile the source and getting this error

make all-am
make[1]: Entering directory `/home/ec2-user/zookeeper/zookeeper-client/zookeeper-client-c'
/bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -Wdeclaration-after-statement -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -Wdeclaration-after-statement -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/zookeeper.o
src/zookeeper.c: In function 'print_completion_queue':
src/zookeeper.c:2542:5: error: null argument where non-null required (argument 1) [-Werror=nonnull]
fprintf(LOGSTREAM,"Completion queue: ");
^~~~~~~
src/zookeeper.c:2544:9: error: null argument where non-null required (argument 1) [-Werror=nonnull]
fprintf(LOGSTREAM,"empty\n");
^~~~~~~
src/zookeeper.c:2550:9: error: null argument where non-null required (argument 1) [-Werror=nonnull]
fprintf(LOGSTREAM,"%d,",cptr->xid);
^~~~~~~
src/zookeeper.c:2553:5: error: null argument where non-null required (argument 1) [-Werror=nonnull]
fprintf(LOGSTREAM,"end\n");
^~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [zookeeper.lo] Error 1
make[1]: Leaving directory `/home/ec2-user/zookeeper/zookeeper-client/zookeeper-client-c'
make: *** [all] Error 2

 

Looking through the code in include/zookeeper_log.h at line 30 LOGSTREAM is defined as NULL. This cause the above error.

In the 3.4.x branch it was getLogStream().

I believe this for the 3.5 branch should be zoo_get_log_stream()

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
23 weeks, 2 days ago 0|z07dao:
ZooKeeper ZOOKEEPER-3568

Remove Netty support from 3.4

Task Open Major Unresolved Andor Molnar Andor Polgari Andor Polgari 03/Oct/19 16:53   04/Oct/19 04:18   3.4.14 3.4.15 java client, server   0 2   branch-3.4 is still on Netty 3 which is not maintained by the Netty-team anymore. We have no intention of upgrading to the new version, instead we propose to remove Netty from the entire 3.4 codebase and encourage users to upgrade to 3.5 which is stable and backward compatible. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
24 weeks ago 0|z079y0:
ZooKeeper ZOOKEEPER-3567

Add SSL support for the zk python client

Improvement Resolved Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 03/Oct/19 14:22   27/Jan/20 06:14 27/Jan/20 05:26 4.0.0 3.6.0, 3.7.0 c client, contrib   0 1 0 3600   As the SSL support is implemented in the C-Client, we can also extend the zkpython with the SSL functionality. 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
7 weeks, 3 days ago 0|z079sw:
ZooKeeper ZOOKEEPER-3566

Send event zxid to watches

Improvement Open Major Unresolved Unassigned Samuel Nelson Samuel Nelson 02/Oct/19 00:36   04/Oct/19 10:55       server   0 3   The zxid that triggered a watch should be sent to the watch because it's useful for ordering events.

 

Use case:

I'm watching a znode and syncing its contents (and whether it has been deleted) to a third system. Without zxid attached to events it makes it very difficult to maintain the order of events as they happened in ZK.

 

For example if I modify node `/a/b/c` and then delete it soon after, we have two watch events, but no reliable way to communicate to our third system that the modification happened before the deletion. If we are given the zxid we can use that to order events.

 

Suggested implementation:

Change `IWatchManager#triggerWatch` to take another parameter `Long zxid`. Callers pass the zxid of the event.

Add member `Long zxid` to `WatchedEvent`

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
24 weeks, 1 day ago 0|z077vk:
ZooKeeper ZOOKEEPER-3565

snapfile modified out of target directory during mvn testing

Bug Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 01/Oct/19 12:00   01/Oct/19 12:00   3.5.5   tests   0 0   I ran the mvn tests on 3.5 branch and then submitted a PR. I didn't notice that one of the snapfiles got caught up in the process, see this commit:

https://github.com/phunt/zookeeper/commit/44c7f93398aa47feea444afd2aaea4592324284e

something seems borked with mvn test - modified (generated/etc...) files should be in target not the mainline code.

See discussion here:

https://github.com/apache/zookeeper/pull/1102#issuecomment-537090502
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
24 weeks, 2 days ago 0|z077bk:
ZooKeeper ZOOKEEPER-3564

org.apache.zookeeper.ClientCnxn EventThread Memory Problem

Bug Open Critical Unresolved Unassigned Hongcai Deng Hongcai Deng 30/Sep/19 22:07   08/Oct/19 03:22       java client   0 3   recently i found some fullgc occur on my java app. i did heapdump and found that

!image-2019-10-01-10-02-28-228.png!

EventThread ate too much memory. I dig into zk code, found that

 
{code:java}
class EventThread extends ZooKeeperThread {
private final LinkedBlockingQueue<Object> waitingEvents =
new LinkedBlockingQueue<Object>();

// code lines
}{code}
waitingEvents not set a boundary. is this for some reason?

 
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
23 weeks, 2 days ago 0|z076cg:
ZooKeeper ZOOKEEPER-3563

dependency check failing on 3.4 and 3.5 branches - CVE-2019-16869 on Netty

Bug Closed Blocker Fixed Unassigned Patrick D. Hunt Patrick D. Hunt 30/Sep/19 14:25   16/Oct/19 14:59 08/Oct/19 09:17 3.5.5, 3.4.14 3.6.0, 3.5.6 security   0 1 0 10800   The mvn dependency check is failing on 3.4 and 3.5:

3.4:
[ERROR] netty-3.10.6.Final.jar: CVE-2019-16869

3.5:
[ERROR] netty-transport-4.1.29.Final.jar: CVE-2019-16869
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
23 weeks, 2 days ago 0|z07600:
ZooKeeper ZOOKEEPER-3562

PKV Games atau Situs PokerV Online Indonesia

Test Open Minor Unresolved Unassigned Michiah arkeasha Michiah arkeasha 29/Sep/19 02:12   20/Dec/19 13:03   3.4.7   other   0 2   AgenQQ, BandarQQ, Domino99, AduQ, Situs Judi, Judi Online
SitusQQ, Agen QQ Online BandarQQ, Poker QQ, Sakong Online
PKV Games atau Situs PokerV Online Indonesia Media Bermain QQ Online
[http://situsqqpokerv.com|http://situsqqpokerv.com] merupakan situs atau AgenQQ permainan online QQ Domino99 dan Poker QQ yang hadir dan siap melayani banyak kalangan penggemar atau peminat yang menyukai permainan yang dimainkan dengan menggunakan kartu remi dan juga kartu domino yang di antaranya seperti permainan BandarQQ, Poker QQ, Domino99, Sakong, Capsa Susun, Bandar66, AduQ, Bandar Poker.

PKV Games situs Poker QQ ini sudah dikenal serta telah banyak pemain yang bergabung bermain dan terbukti akan kualitas permainan yang disajikan. Permainan secara lengkap yang bisa dipilih mainkan hanya dengan 1 User ID saja yang anda semua bisa daftarkan diri di Agen PKV Games.

Sebagai situs dari Domino99, [Poker QQ|https://makedah.podbean.com/] dan BandarQQ Online yang terpercaya dan terbaik Indonesia. kami PKV Games tentu saja memberikan permainan yang FairPlay kepada seluruh pemain yang bermain dengan 100% murni bermain melawan pemain lain secara langsung tanpa kecurangan admin atau robot yang bermain di dalam situs PKV Games ini.

PKV Games adalah situs permainan kartu online terpercaya yang sudah terakui kemanannya. Dan juga memberikan banyak bonus kepada setiap member dalam bermain Domino99 dan bandarq online terbaik. Bonus yang diberikan merupakan salah satu yang terbesar di Asia kepada seluruh member. Untuk bonus rolingan disetiap permainan Poker QQ maupun domino99 sebesar 0.5% dan bonus referral hingga 20%. Apabila anda bisa mengajak teman untuk main bersama di PKV Games. Untuk hitungan bonus rolingan yaitu 0.5% di kalikan dengan total menang & kalah anda dalam bermain BandarQQ atau Domino99 selama satu pekan. Sedangkan untuk bonus mereferansikan teman, 10% akan masuk otomatis dan 10% lagi akan dibagikan secara manual setiap hari [jumat|https://en.wikipedia.org/wiki/Friday].
9223372036854775807 SitusQQ, Agen QQ Online BandarQQ, Poker QQ, Sakong Online
PKV Games atau Situs PokerV Online Indonesia Media Bermain QQ Online
No Perforce job exists for this issue. 0 9223372036854775807
12 weeks, 6 days ago PokerQQ, BandarQQ, DominoQQ, AgenQQ, SitusQQ, QQ, PokerQ, BandarQ, DominoQ, AgenQ, SitusQ none http://situsqqpokerv.com 0|z074f4:
ZooKeeper ZOOKEEPER-3561

Generalize target authentication scheme for ZooKeeper authentication enforcement.

Improvement Open Major Unresolved Mohammad Arshad Michael Han Michael Han 26/Sep/19 19:41   03/Oct/19 14:45   3.6.0   server   0 2 0 1200   ZOOKEEPER-1634 introduced an option to allow user enforce authentication for ZooKeeper clients, but the enforced authentication scheme in committed implementation was SASL only.

This JIRA is to generalize the authentication scheme such that the authentication enforcement on ZooKeeper clients could work with any supported authentication scheme.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
24 weeks, 6 days ago 0|z072eg:
ZooKeeper ZOOKEEPER-3560

Add response cache to serve get children (2) requests.

Improvement Resolved Major Fixed Michael Han Michael Han Michael Han 26/Sep/19 02:24   18/Nov/19 03:29 18/Nov/19 03:06   3.6.0 server   0 2 0 10800   ZOOKEEPER-3180 introduces response cache but it only covers getData requests. This JIRA is to extend the response cache based on the infrastructure set up by ZOOKEEPER-3180 to so the response of get children requests can also be served out of cache. Some design decisions:

* Only OpCode.getChildren2 is supported, as OpCode.getChildren does not have associated stats and current cache infra relies on stats to invalidate cache.

* The children list is stored in a separate response cache object so it does not pollute the existing data cache that's serving getData requests, and this separation also allows potential separate tuning of each cache based on workload characteristics.

* As a result of cache object separation, new server metrics is added to measure cache hit / miss for get children requests, that's separated from get data requests.
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 3 days ago 0|z0717k:
ZooKeeper ZOOKEEPER-3559

Update Jackson to 2.9.10

Bug Closed Major Fixed Colm O hEigeartaigh Colm O hEigeartaigh Colm O hEigeartaigh 25/Sep/19 14:37   16/Oct/19 14:59 27/Sep/19 07:22   3.6.0, 3.5.6     0 2 0 3600   Jackson should be updated to the latest version to pick up a fix for CVE-2019-14540 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
24 weeks, 6 days ago 0|z070og:
ZooKeeper ZOOKEEPER-3558

Support authentication enforcement

New Feature Open Major Unresolved Mohammad Arshad Mohammad Arshad Mohammad Arshad 25/Sep/19 11:59   05/Feb/20 07:16     3.5.8     0 3   Provide authentication enforcement in ZooKeeper that is backward compatible and can work for any authentication scheme, can work even with custom authentication schemes.

*Problems:*
1. Currently server is starting with default authentication providers(DigestAuthenticationProvider, IPAuthenticationProvider). These default authentication providers are not really secure.
2. ZooKeeper server is not checking whether authentication is done or not before performing any user operation.

*Solutions:*
1. We should not start any authentication provider by default. But this would be backward incompatible change. So we can provide configuration whether to start default authentication provides are not.
By default we can start these authentication providers.
2. Before any user operation server should check whether authentication happened or not. At least client must be authenticated with one authentication scheme.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
24 weeks, 6 days ago 0|z070ko:
ZooKeeper ZOOKEEPER-3557

Towards a testable codebase

Task Open Major Unresolved Unassigned Zili Chen Zili Chen 25/Sep/19 11:56   17/Oct/19 03:29       tests   0 2   This issue is umbrella issue tracks all efforts towards a testable ZooKeeper codebase.

*Motivation*

On the one hand, many of our adopters such as HBase, Curator and so on maintain their own testkit for ZooKeeper[1][2]; on the other hand, ZooKeeper itself doesn't have a well-designed testkit. Here are some of issues in our testing "framework".

1. {{ZooKeeperTestable}} becomes a production scope class while it should be in testing scope.
2. {{ZooKeeperTestable}} is only used in {{SessionTimeoutTest}} while its name infers a completed testing class.
3. {{ClientBase}} is super class of many of zookeeper tests while it contains too many orthogonal functions that its subclass inherits lots of burdens that is not required.
4. Testing logics are injected casually so that we suffer from visibility chaos.
...

Due to ZooKeeper doesn't provide testkit our adopters have to write ZK relative tests with quite internal concepts. For example, HBase wait for ZK server launched by 4 letter words which causes issue when upgrade from ZK 3.4.x to ZK 3.5.5 where 4 letter words are disabled by default.

[1] https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
[2] https://github.com/apache/curator/blob/master/curator-test/src/main/java/org/apache/curator/test/TestingCluster.java
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks ago 0|z070jk:
ZooKeeper ZOOKEEPER-3556

Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down

Wish Open Major Unresolved Unassigned Steven Chan Steven Chan 25/Sep/19 03:05   25/Sep/19 13:26   3.5.5   java client   0 2 43200 43200 0% *I encountered a problem which blocks my development of load balance using ZooKeeper 3.5.5.*

   *Actually, I have a ZooKeeper cluster which comprises of five zk servers. And the dynamic configuration file is as follows:*

 ** 

{color:#FF0000}  *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}

{color:#FF0000}  *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}

{color:#FF0000}  *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}

{color:#FF0000}  *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}

{color:#FF0000}  *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}

 ** 

  *The zk cluster can work fine if every member works normally. However, if say two of them are suddenly down without previously being notified,*

*the dynamic configuration file shown above will not be synchronized dynamically, which leads to the zk cluster fail to work normally.*

  *As far as I am concerned, the dynamic configuration file should be modified to this if server 1 and server 5 are down suddenly as follows:*

{color:#FF0000}  *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}

{color:#FF0000}  *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}

{color:#FF0000}  *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}

*But in this case, the dynamic configuration file will never change automatically unless you manually revise it.*

  *I think this is a very common case which may happen at any time. If so, how can we handle with it?*
0% 0% 43200 43200 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 1 day ago 0|z06zn4:
ZooKeeper ZOOKEEPER-3554

A way to connect to a different port in CLIENT_JVMFLAGS

Improvement Open Minor Unresolved Unassigned Agostino Sarubbo Agostino Sarubbo 23/Sep/19 06:36   23/Sep/19 06:36   3.5.5       0 2   I followed this article to enable ssl in zookeeper:
[https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide]

When I need to connect via zkCli.sh I have to do:
./zkCli.sh -server localhost:2281

If I put those settings in CLIENT_JVMFLAGS it does not work because they need to be after the main class in term of order.

So since I specify all settings in CLIENT_JVMFLAGS, make sense to have a property or something similar to specify the default host/port.

I didn't find a properties to do that, or at least they look to be server properties (clientPortAddress/clientPort)
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 3 days ago 0|z06wxc:
ZooKeeper ZOOKEEPER-3553

Create Performance Test Tool

New Feature Open Major Unresolved Unassigned David Mollitor David Mollitor 20/Sep/19 09:59   20/Sep/19 09:59   3.6.0       0 3   Create a tool similar to [Dynamometer|https://github.com/linkedin/dynamometer] for ZooKeeper.

To use this tool, the operator should collect transaction log files from a live instance of ZooKeeper and supply them to the tool. The tool then simply replays all of the commands it finds in all of the supplied transaction logs, in order.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 6 days ago 0|z06v14:
ZooKeeper ZOOKEEPER-3552

Source tarbal for branch-3.5 does not set execution permissions to "configure" file

Bug Closed Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 19/Sep/19 14:13   16/Oct/19 14:58 20/Sep/19 04:20 3.5.6 3.5.6 build, c client   0 1 0 3600   During the Rc0 VOTE of 3.5.6 we found that the 'configure' file inside the source tarball does not have the right permissions.

100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
26 weeks ago 0|z06trc:
ZooKeeper ZOOKEEPER-3549

zookeeper auto purge bug

Bug Open Major Unresolved Unassigned yeshuangshuang yeshuangshuang 17/Sep/19 06:02   06/Mar/20 23:55   3.4.5       0 2 1209600 1209600 0% zookeeper 3.4.5 i found a bug,the log.xxx is not auto purge as 65MB, it increases 1GB 0% 0% 1209600 1209600 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 week, 5 days ago 0|z06pkw:
ZooKeeper ZOOKEEPER-3548

Redundant zxid check in SnapStream.isValidSnapshot

Improvement Resolved Minor Fixed Michael Han Michael Han Michael Han 16/Sep/19 23:46   26/Sep/19 12:49 26/Sep/19 08:20   3.6.0 server   0 2 0 3000   getZxidFromName is called twice in isValidSnapshot, and the second call is redundant and should be removed. 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks ago 0|z06p9c:
ZooKeeper ZOOKEEPER-3547

Add detailed documentation on throttling

Improvement Open Minor Unresolved Unassigned Jie Huang Jie Huang 16/Sep/19 12:21   14/Dec/19 06:08     3.7.0 documentation   0 1   From ZOOKEEPER-3492: Add weights to server side connection throttling

"However, given the size and impact of the feature, I'd really love to see a dedicated section in the documentation for throttling. Something similar to what I added for Quorum TLS: [https://github.com/apache/zookeeper/blob/master/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md#Quorum+TLS]

It'd be really nice to see detailed explanation of how this Blue-throttling stuff works, where does it come from, some links to literature for instance, etc. I also imagine a nice how-to about the right way of setting things up."
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
26 weeks, 3 days ago 0|z06om0:
ZooKeeper ZOOKEEPER-3546

Containers that never have children stay forever

Bug Resolved Major Fixed Jordan Zimmerman Sylvain Wallez Sylvain Wallez 16/Sep/19 09:21   03/Dec/19 13:37 25/Nov/19 09:38 3.5.3, 3.5.5 3.6.0 server   0 1 0 25200   {{ContainerManager}} does not delete containers whose cversion is zero to avoid situations where a container would be deleted before the application had the chance to create children.

This caused issues in our application where the process stopped between container creation and child creation: the containers were never deleted.

To avoid this while giving applications the time to create children, empty containers with a cversion of zero should be deleted after a grace period, e.g. not when they are first collected, but the second time.
100% 100% 25200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 3 days ago 0|z06odc:
ZooKeeper ZOOKEEPER-3545

Fix LICENSE files for netty dependency

Task Closed Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 14/Sep/19 11:54   16/Oct/19 14:59 16/Sep/19 17:28 3.6.0, 3.5.6 3.6.0, 3.5.6 build   0 2 0 3000   We have to fix LICENSE files because in 3.5.5 we have netty-all in 3.5.6 we have multiple netty files.
Our current LICENSE files layout is to have one file per each JAR in "lib"
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
26 weeks, 3 days ago 0|z06n8o:
ZooKeeper ZOOKEEPER-3542

X509UtilTest#testClientRenegotiationFails is flaky on JDK8 + linux on machines with 2 cores

Test Closed Critical Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 11/Sep/19 17:28   16/Oct/19 14:59 18/Sep/19 10:07 3.5.5 3.6.0, 3.5.6 build, tests   0 3 0 9600   On this Fedora machine:
[eolivelli@localhost zookeeper-server]$ uname -a
Linux localhost.localdomain 5.2.9-200.fc30.x86_64 #1 SMP Fri Aug 16 21:37:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[eolivelli@localhost zookeeper-server]$ mvn -v
Apache Maven 3.6.2 (40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T17:06:16+02:00)
Maven home: /home/eolivelli/Scaricati/maven
Java version: 1.8.0_222, vendor: AdoptOpenJDK, runtime: /home/eolivelli/dev/jdk8u222-b10/jre
Default locale: it_IT, platform encoding: UTF-8
OS name: "linux", version: "5.2.9-200.fc30.x86_64", arch: "amd64", family: "unix"


[eolivelli@localhost zookeeper-server]$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 112
model name : AMD A9-9410 RADEON R5, 5 COMPUTE CORES 2C+3G
stepping : 0
microcode : 0x6006704
cpu MHz : 1444.800
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 16
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good acc_power nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate ssbd vmmcall fsgsbase bmi1 avx2 smep bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 5789.50
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro acc_power [13]

processor : 1
vendor_id : AuthenticAMD
cpu family : 21
model : 112
model name : AMD A9-9410 RADEON R5, 5 COMPUTE CORES 2C+3G
stepping : 0
microcode : 0x6006704
cpu MHz : 1483.889
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 17
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good acc_power nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate ssbd vmmcall fsgsbase bmi1 avx2 smep bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 5789.50
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro acc_power [13]


[INFO] Running org.apache.zookeeper.common.X509UtilTest
[ERROR] Tests run: 336, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 12.382 s <<< FAILURE! - in org.apache.zookeeper.common.X509UtilTest
[ERROR] testClientRenegotiationFails[1](org.apache.zookeeper.common.X509UtilTest) Time elapsed: 0.103 s <<< ERROR!
java.lang.Exception: Unexpected exception, expected<javax.net.ssl.SSLHandshakeException> but was<java.lang.AssertionError>
at org.apache.zookeeper.common.X509UtilTest.testClientRenegotiationFails(X509UtilTest.java:575)

[ERROR] testClientRenegotiationFails[4](org.apache.zookeeper.common.X509UtilTest) Time elapsed: 0.064 s <<< ERROR!
java.lang.Exception: Unexpected exception, expected<javax.net.ssl.SSLHandshakeException> but was<java.lang.AssertionError>
at org.apache.zookeeper.common.X509UtilTest.testClientRenegotiationFails(X509UtilTest.java:575)


I see this test failing very often:

[ERROR] testClientRenegotiationFails[6](org.apache.zookeeper.common.X509UtilTest) Time elapsed: 0.046 s <<< ERROR!
java.lang.Exception: Unexpected exception, expected<javax.net.ssl.SSLHandshakeException> but was<java.lang.AssertionError>
at org.apache.zookeeper.common.X509UtilTest.testClientRenegotiationFails(X509UtilTest.java:575)

[ERROR] testClientRenegotiationFails[7](org.apache.zookeeper.common.X509UtilTest) Time elapsed: 0.06 s <<< ERROR!
java.lang.Exception: Unexpected exception, expected<javax.net.ssl.SSLHandshakeException> but was<java.lang.AssertionError>
at org.apache.zookeeper.common.X509UtilTest.testClientRenegotiationFails(X509UtilTest.java:575)
100% 100% 9600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
26 weeks ago 0|z06k5c:
ZooKeeper ZOOKEEPER-3541

Wrong placeholder '{}' in logs.

Bug Open Minor Unresolved Unassigned hu xiaodong hu xiaodong 10/Sep/19 01:16   18/Sep/19 09:30   3.5.5   server   0 1 0 6600   In the method 'org.apache.zookeeper.client.ZooKeeperSaslClient#respondToServer', 
{code:java}
 LOG.error("SASL authentication failed using login context '"
+ this.getLoginContext()
+ "' with exception: {}", e); {code}
// I think '{}' above is wrong. It's redundant.

 

!image-2019-09-10-14-02-30-306.png!
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
27 weeks, 2 days ago 0|z06hog:
ZooKeeper ZOOKEEPER-3540

Client port unavailable after binding the same client port during reconfig

Bug Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 09/Sep/19 12:41   10/Sep/19 20:58 10/Sep/19 12:51 3.6.0 3.6.0 server   0 2 0 2400   When dynamically replace a server with IPv4/IPv6 with the same port, the server will complain about 'address already in use', and cause the client port not available anymore. 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 1 day ago 0|z06h2g:
ZooKeeper ZOOKEEPER-3539

Fix branch-3.5 after upgrade on ASF CI

Task Closed Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 08/Sep/19 04:06   16/Dec/19 08:03 09/Sep/19 07:49 3.5.5 3.5.6 build, build-infrastructure   0 1 0 7800   ASF CI now lacks "findbugs" tool.
ASF CI upgraded gcc and now there are some errors related to the usage of NULL in calls of fprintf.

We should:
- disable findbugs on CI ant-based tasks
- use LOG_DEBUG consistently with the rest of code in C client

Please note that on branch-3.5 we are officially using Maven and Spotbugs, ant stuff is there only for compatibility in build, findbugs is not strictly needed to check the goodness of code in branch-3.5.

branch-3.4 is only with ANT, let's treat this problem as a different task
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 3 days ago 0|z06fpc:
ZooKeeper ZOOKEEPER-3538

test

Test Resolved Trivial Invalid Unassigned adinda yukari adinda yukari 07/Sep/19 22:27   13/Sep/19 00:14 13/Sep/19 00:14 3.4.10   jute, quorum   0 1   Taruhan Judi PokerQ, DominoQQ dan BandarQ
PokerQQ | BandarQ | DominoQQ | Agen QQ | Situs QQ
Agen QQ Online JempolQQ

[http://situsqqpkv.com|http://example.com/]
Taruhan Judi PokerQ, DominoQQ dan BandarQ
PokerQQ | BandarQ | DominoQQ | Agen QQ | Situs QQ
Agen QQ Online JempolQQ

[http://situsqqpkv.com|http://example.com/]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 4 days ago PokerQQ, BandarQQ, DominoQQ, AgenQQ, SitusQQ, QQ, PokerQ, BandarQ, DominoQ, AgenQ, SitusQ 0|z06fo0:
ZooKeeper ZOOKEEPER-3537

Leader election - Use of out of election messages

Improvement Resolved Trivial Fixed Karolos Antoniadis Karolos Antoniadis Karolos Antoniadis 07/Sep/19 19:34   30/Sep/19 21:19 30/Sep/19 17:22   3.6.0     0 3 0 6600   Hello ZooKeeper developers,

in {{lookForLeader}} in {{FastLeaderElection}} there is the following switch block in case a notification message {{n}} is received where {{n.state}} is either {{FOLLOWING}} or {{LEADING}} ([https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1029]).
{code:java}
case FOLLOWING:
case LEADING:
/*
* Consider all notifications from the same epoch
* together.
*/
if (n.electionEpoch == logicalclock.get()) {
recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch));
voteSet = getVoteTracker(recvset, new Vote(n.version, n.leader, n.zxid, n.electionEpoch, n.peerEpoch, n.state));
if (voteSet.hasAllQuorums() && checkLeader(outofelection, n.leader, n.electionEpoch)) {
setPeerState(n.leader, voteSet);
Vote endVote = new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
}

/*
* Before joining an established ensemble, verify that
* a majority are following the same leader.
*/
outofelection.put(n.sid, new Vote(n.version, n.leader, n.zxid, n.electionEpoch, n.peerEpoch, n.state));
voteSet = getVoteTracker(outofelection, new Vote(n.version, n.leader, n.zxid, n.electionEpoch, n.peerEpoch, n.state));

if (voteSet.hasAllQuorums() && checkLeader(outofelection, n.leader, n.electionEpoch)) {
synchronized (this) {
logicalclock.set(n.electionEpoch);
setPeerState(n.leader, voteSet);
}
Vote endVote = new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
break;{code}
 

We notice that when {{n.electionEpoch == logicalclock.get()}}, votes are being added into {{recvset}}, however {{checkLeader}} is called immediately afterwards with the votes in {{outofelection}} as can be seen here ([https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1037]).

Checking {{outofelection}} instead of {{recvset}} does not cause any problems.
If {{checkLeader}} on {{outofelection}} fails, although it would have succeeded in {{recvset}}, {{checkLeader}} succeeds immediately afterwards when the vote is added in {{outofelection}}.
Still, it seems natural to check for a leader in {{recvSet}} and not in {{outofelection}}. 



Cheers,
Karolos

 
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
24 weeks, 2 days ago 0|z06fmg:
ZooKeeper ZOOKEEPER-3536

On Windows maven build generates corrupted tarball

Bug Resolved Minor Invalid Unassigned Mohammad Arshad Mohammad Arshad 05/Sep/19 23:13   20/Sep/19 06:28 20/Sep/19 06:28 3.5.5   build   0 3   On windows maven command {code}mvn clean install -DskipTests{code} creates corrupted tarballs.
In zookeeper-assembly/pom.xml <tarLongFileMode>posix</tarLongFileMode> causing the problem. Many use Windows as development environment. it would be better if we can make tarLongFileMode property configurable or select based on OS.

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 6 days ago 0|z06dnc:
ZooKeeper ZOOKEEPER-3535

support Cluster Id to identify an ensemble

New Feature Open Major Unresolved maoling maoling maoling 05/Sep/19 22:27   26/Dec/19 21:50     3.7.0 documentation, server   0 1   Every new zk cluster generates a new cluster ID based on the initial cluster configuration and a user-provided unique initial-cluster-token value. By having unique cluster ID’s, zk is protected from cross-cluster interaction which could corrupt the cluster.

Usually this warning happens after tearing down an old cluster, then reusing some of the peer addresses for the new cluster. If any zk process from the old cluster is still running it will try to contact the new cluster. The new cluster will recognize a cluster ID mismatch, then ignore the request and emit this warning. This warning is often cleared by ensuring peer addresses among distinct clusters are disjoint.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks, 6 days ago 0|z06dkg:
ZooKeeper ZOOKEEPER-3534

Non-stop communication between participants and observers.

Bug Open Minor Unresolved Unassigned Karolos Antoniadis Karolos Antoniadis 03/Sep/19 13:08   04/Sep/19 19:40           0 3   Hello ZooKeeper developers,

there are cases during *leader election*, where there is non-stop communication between observers and participants.
This communication occurs as follows:
- an observer sends a notification to a participant
- the participant responds
- an observer sends another notification and so on and so forth ...

It is possible that an observer-participant pair exchange hundreds of notification messages in the span of one second. As a consequence, the system is burdened with unnecessary load, and the logs are filled with useless information as can be seen below:

 
{noformat}
2019-09-03 16:37:22,630 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x0, message format version:0x2, n.config version:0x100000000
2019-09-03 16:37:22,632 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x0, message format version:0x2, n.config version:0x100000000
2019-09-03 16:37:22,633 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x0, message format version:0x2, n.config version:0x100000000
2019-09-03 16:37:22,635 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x0, message format version:0x2, n.config version:0x100000000
2019-09-03 16:37:22,635 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection@692] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x1, n.zxid:0x0, message format version:0x2, n.config version:0x100000000{noformat}
 

 
h4. Why does the non-stop communication bug occur?

This bug stems from the fact that when a participant receives a notification from an observer, the participant responds right away, as can be seen [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L325] - it is even written in the comments. Now, when the observer receives back the message from the participant there are 2 cases that could lead to non-stop communication:
1) The observer has a greater {{logicalclock}} than the participant and both the observer and the participant are in a {{LOOKING}} state. In such a case, the observer responds right away to the participant as can be seen [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L392].
2) The observer is {{OBSERVING}} while the participant is still {{LOOKING}}, then the non-stop communication ensues due to the code in [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L413].  
h4.  
h4. How can we reproduce this non-stop communication bug?

It is not trivial to reproduce this bug, although we saw it occurring in the wild. To reproduce this bug, we provide a script that utilizes docker and that can be used to easily debug ZK code. The script starts a ZK cluster with 3 participants (P1, P2, P3) and 2 observers (O1, O2). The script together with instructions on how to use it can be found [here|https://github.com/insumity/zookeeper_debug_tool].

 

Using the script, there are at least 2 ways to reproduce the bug:
1) We can artificially delay the leader election by introducing the following code in {{FastLeaderElection}} (in [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1006]).

 
{code:java}
// Verify if there is any change in the proposed leader
int time = finalizeWait;
if (self.getId() >= 1 && self.getId() <= 3) {
time = 2000;
}{code}
 

and changing the immediate succeeding line:
{code:java}
while ((n = recvqueue.poll(finalizeWait, TimeUnit.MILLISECONDS)) != null) {code}
to

 
{code:java}
while ((n = recvqueue.poll(time, TimeUnit.MILLISECONDS)) != null) { 
{code}
Now, if we run a ZK cluster and force a leader election by killing the leader, we see the non-stop communication occurring. The reason is that  as a result of this delay the observer restarts (increments its {{logicalclock}}), tries to connect to the previous leader, but fails since the previous leader is crashed, and the observer restarts by incrementing {{logicalclock}} once more and hence starting the non-stop communication.


2) Another way to reproduce the bug is by creating a network partition that partitions P1 from P2, P3, O2 but that still keeps participant P1 connected to observer O1. In such a case, the non-stop communication ensues since O1 is {{OBSERVING}} while P1 remains in a {{LOOKING}} state. To reproduce this bug, using the above script, someone just has to do:
*  wait till the ZK cluster starts running
*  in your local machine do ./create_np_case_3.sh (attached file in this issue)
*  force a leader election by restarting the leader (most likely the leader is server 3)


It is true that scenario 2 is slightly unrealistic. However, the first scenario where leader election takes too much time to complete is pretty realistic.  Whenever we saw this non-stop communication bug, it was because leader election took too long to complete. For instance, it could occur if there is some type of split-vote during LE and the elected leader times out while
{noformat}
waiting for epoch from quorum {noformat}
[here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L1350].

 
h4.
How can we fix this issue?

One idea would be that before an observer starts observing a leader, it verifies that the leader is up and running using a check similar to {{checkLeader}} as is done [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L1037].
This will prevent from having non-stop communication between observers and participants during long leader elections, since observers do not try to connect to an already failed leader, and hence they will not increase their {{logicalclock}}. However, this fix on its own does not solve the 2nd way to reproduce the bug that was described above.

Best Regards,
Karolos

 
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
28 weeks, 1 day ago 0|z06a48:
ZooKeeper ZOOKEEPER-3533

Create systemic alerts for zookeeper

Task Open Critical Unresolved Unassigned Anson Qian Anson Qian 03/Sep/19 12:45   03/Sep/19 13:48           0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
28 weeks, 2 days ago 0|z06a3c:
ZooKeeper ZOOKEEPER-3532

Provide a docker-based environment to work on a known OS

Improvement Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 03/Sep/19 08:21   09/Sep/19 22:50 09/Sep/19 18:33   3.6.0 build   1 2 0 6600   We can have a docker based environment to launch a container with a know version of Linux, Java, Maven, gcc and all of the other libraries.
This way it is easier to work on MacOS and in the future we could have a known environment to build releases and have reproducible builds even for native code.

The idea is take from Apache BookKeeper project
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 2 days ago 0|z069u8:
ZooKeeper ZOOKEEPER-3531

Synchronization on ACLCache cause cluster to hang when network/disk issues happen during datatree serialization

Bug Resolved Critical Fixed Chang Lou Chang Lou Chang Lou 02/Sep/19 17:02   09/Oct/19 15:28 09/Oct/19 10:39 3.5.2, 3.5.3, 3.5.4, 3.5.5 3.6.0     0 4 0 10200   During our ZooKeeper fault injection testing, we observed that sometimes the ZK cluster could hang (requests time out, node status shows ok). After inspecting the issue, we believe this is caused by I/O (serializing ACLCache) inside a critical section. The bug is essentially similar to what is described in ZooKeeper-2201.

org.apache.zookeeper.server.DataTree#serialize calls the aclCache.serialize when doing dataree serialization, however, org.apache.zookeeper.server.ReferenceCountedACLCache#serialize could get stuck at OutputArchieve.writeInt due to potential network/disk issues. This can cause the system experiences hanging issues similar to ZooKeeper-2201 (any attempt to create/delete/modify the DataNode will cause the leader to hang at the beginning of the request processor chain). The root cause is the lock contention between:
* org.apache.zookeeper.server.DataTree#serialize -> org.apache.zookeeper.server.ReferenceCountedACLCache#serialize 
* PrepRequestProcessor#getRecordForPath -> org.apache.zookeeper.server.DataTree#getACL(org.apache.zookeeper.server.DataNode) -> org.apache.zookeeper.server.ReferenceCountedACLCache#convertLong

When the snapshot gets stuck in acl serialization, it would prevent all other operations to ReferenceCountedACLCache. Since getRecordForPath calls ReferenceCountedACLCache#convertLong, any op triggering getRecordForPath will cause the leader to hang at the beginning of the request processor chain:
{code:java}
org.apache.zookeeper.server.ReferenceCountedACLCache.convertLong(ReferenceCountedACLCache.java:87)
org.apache.zookeeper.server.DataTree.getACL(DataTree.java:734)
   - locked org.apache.zookeeper.server.DataNode@4a062b7d
org.apache.zookeeper.server.ZKDatabase.aclForNode(ZKDatabase.java:371)
org.apache.zookeeper.server.PrepRequestProcessor.getRecordForPath(PrepRequestProcessor.java:170)
   - locked java.util.ArrayDeque@3f7394f7
org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:417)
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:757)
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:145)
{code}
Similar to ZooKeeper-2201, the leader can still send out heartbeats so the cluster will not recover until the network/disk issue resolves.  

Steps to reproduce this bug:
# start a cluster with 1 leader and n followers
# manually create some ACLs, to enlarge the window of dumping acls so it would be more likely to hang at serializing ACLCache when delay happens. (we wrote a script to generate such workloads, see attachments)
# inject long network/disk write delays and run some benchmarks to trigger snapshots
# once stuck, you should observe new requests to the cluster would fail.

Essentially the core problem is the OutputArchive write should not be kept inside this synchronization block. So a straightforward solution is to move writes out of sync block: do a copy inside the sync block and perform vulnerable network writes afterwards. The patch for this solution is attached and verified.  Another more systematic fix is perhaps replacing all synchronized methods in the ReferenceCountedACLCache with ConcurrentHashMap. 

We double checked that the issue remains in the latest version of master branch (68c21988d55c57e483370d3ee223c22da2d1bbcf). 

Attachments are 1) patch for fix and regression test 2) scripts to generate workloads to fill ACL cache
100% 100% 10200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
23 weeks, 1 day ago 0|z069bk:
ZooKeeper ZOOKEEPER-3530

Include compiled C-client in the binary tarball

Improvement Resolved Major Fixed Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 02/Sep/19 10:24   24/Oct/19 06:14 24/Oct/19 02:50 3.6.0, 3.5.7 3.6.0     0 2 0 4200   During the old ZooKeeper 3.4 ant builds ({{ant package-native}}), there was an artifact ({{zookeeper-<version>-lib.tar.gz}}) created just for the C-client, with the following content:
{code:bash}
usr
|--- bin
|--- cli_mt
|--- cli_st
|--- load_gen
|--- include
|--- zookeeper
|--- proto.h
|--- recordio.h
|--- zookeeper.h
|--- zookeeper.jute.h
|--- zookeeper_log.h
|--- zookeeper_version.h
|--- lib
|--- libzookeeper_mt.a
|--- libzookeeper_mt.la
|--- libzookeeper_mt.so
|--- libzookeeper_mt.so.2
|--- libzookeeper_mt.so.2.0.0
|--- libzookeeper_st.a
|--- libzookeeper_st.la
|--- libzookeeper_st.so
|--- libzookeeper_st.so.2
|--- libzookeeper_st.so.2.0.0
{code}
Currently with maven, when we are generating a tarball during full-build then the C-client is not getting archived. In [PR-1078|https://github.com/apache/zookeeper/pull/1078] we discussed that we should re-introduce the {{apache-zookeeper-<version>-lib.tar.gz}} artifact.

The goals of this task are:
* re-introduce the 'lib' artifact, with the same structure we had for the older zookeeper 3.4.x ant generated artifact
* we should also add the LICENSE.txt file to the archive (it was missing from the 3.4.x version tar.gz file)
* the new artifact should be generated only when the full-build profile is set for maven
* we should also update the README_packaging.md file
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
21 weeks ago 0|z06920:
ZooKeeper ZOOKEEPER-3529

ZOOKEEPER-3282 add a new doc: zookeeperUseCases.md

Sub-task Resolved Major Fixed maoling maoling maoling 02/Sep/19 04:15   23/Sep/19 18:00 23/Sep/19 16:22   3.6.0 documentation   0 2 0 14400     write Use Cases[2.5], which includes the:

 - just move the context from: [https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy] into it.

 - add some new contents.(e.g Apache Projects:Spark;Companies:twitter,fb)
100% 100% 14400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
25 weeks, 3 days ago 0|z068k8:
ZooKeeper ZOOKEEPER-3528

ZOOKEEPER-3469 Revisit AsyncCallback javadoc

Sub-task Resolved Major Fixed Zili Chen Zili Chen Zili Chen 31/Aug/19 04:31   05/Sep/19 19:13 05/Sep/19 16:59 3.6.0 3.6.0 documentation   0 3 0 6600   100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
28 weeks ago 0|z067rk:
ZooKeeper ZOOKEEPER-3527

add a series of docker thing about zookeeper

Improvement Resolved Major Invalid Unassigned maoling maoling 30/Aug/19 04:06   17/Oct/19 22:07 17/Oct/19 22:07 3.6.0   documentation, scripts, server   0 2   * Add a dockerfile and all the related docker stuff to the zk trunk
* Add a link to that [official zookeeper image|[https://hub.docker.com/_/zookeeper]] in the zk official documentation(*README.md*)
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
21 weeks, 6 days ago 0|z066lk:
ZooKeeper ZOOKEEPER-3526

When a zk Node just become the leader,synchronizing with Follower in the first time, the value of maxCommittedLog may be smaller than the value of minCommittedLog,is that a problem?

Bug Open Major Unresolved Unassigned wanglei wanglei 30/Aug/19 00:56   27/Feb/20 05:59   3.4.14   server   0 2   1.version:3.4.14

2.num of zk nodes: 3
1.Node1:1566815238 (myid)、Node2:1566815239 (myid)、Node3:1566815240 (myid)

2.After a election, Node3 become the new leader, begin to sync with followers

 

*2019-08-27 04:26:09,521 [myid:1566815240] - INFO [NIOServerCxn.Factory:/172.28.8.123:9880:ZooKeeperServer@910][] - Refusing session request for not-read-only client /172.28.0.3:38994*
*2019-08-27 04:26:09,609 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:ZooKeeperServer@502][] - shutting down*
*2019-08-27 04:26:09,609 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:SessionTrackerImpl@226][] - Shutting down*
*2019-08-27 04:26:09,609 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:PrepRequestProcessor@769][] - Shutting down*
*2019-08-27 04:26:09,609 [myid:1566815240] - INFO [ReadOnlyRequestProcessor:1566815240:ReadOnlyRequestProcessor@111][] - ReadOnlyRequestProcessor exited loop!*
*2019-08-27 04:26:09,610 [myid:1566815240] - INFO [ProcessThread(sid:1566815240 cport:-1)::PrepRequestProcessor@144][] - PrepRequestProcessor exited loop!*
*2019-08-27 04:26:09,610 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:FinalRequestProcessor@430][] - shutdown of request processor complete*
*2019-08-27 04:26:09,613 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:QuorumPeer@992][] - LEADING*
*2019-08-27 04:26:09,615 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:Leader@64][] - TCP NoDelay set to: true*
*2019-08-27 04:26:09,616 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:ZooKeeperServer@174][] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 300000 datadir /opt/fusionplatform/data/zookeeper/data/version-2 snapdir /opt/fusionplatform/data/zookeeper/data/version-2*
*2019-08-27 04:26:09,616 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:Leader@380][] - {color:#FF0000}LEADING - LEADER ELECTION TOOK - 15297{color}*
*2019-08-27 04:26:09,956 [myid:1566815240] - INFO [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxnFactory@222][] - Accepted socket connection from /172.28.0.3:39012*
*2019-08-27 04:26:09,956 [myid:1566815240] - WARN [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxn@383][] - Exception causing close of session 0x0: ZooKeeperServer not running*
*2019-08-27 04:26:09,974 [myid:1566815240] - INFO [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxnFactory@222][] - Accepted socket connection from /172.28.0.2:50732*
*2019-08-27 04:26:09,974 [myid:1566815240] - WARN [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxn@383][] - Exception causing close of session 0x0: ZooKeeperServer not running*
*2019-08-27 04:26:10,513 [myid:1566815240] - INFO [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxnFactory@222][] - Accepted socket connection from /172.28.0.5:60010*
*2019-08-27 04:26:10,514 [myid:1566815240] - WARN [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxn@383][] - Exception causing close of session 0x0: ZooKeeperServer not running*
*2019-08-27 04:26:10,516 [myid:1566815240] - INFO [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxnFactory@222][] - Accepted socket connection from /172.28.0.5:60020*
*2019-08-27 04:26:10,517 [myid:1566815240] - WARN [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxn@383][] - Exception causing close of session 0x0: ZooKeeperServer not running*
*2019-08-27 04:26:10,530 [myid:1566815240] - INFO [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxnFactory@222][] - Accepted socket connection from /172.28.0.5:60024*
*2019-08-27 04:26:10,531 [myid:1566815240] - WARN [NIOServerCxn.Factory:/172.28.8.123:9880:NIOServerCnxn@383][] - Exception causing close of session 0x0: ZooKeeperServer not running*
*2019-08-27 04:26:10,619 [myid:1566815240] - INFO [LearnerHandler-/172.28.0.2:59666:LearnerHandler@346][] - Follower sid: 1566815238 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@6f38a687*
*2019-08-27 04:26:10,747 [myid:1566815240] - INFO [LearnerHandler-/172.28.0.2:59666:LearnerHandler@401][] -{color:#FF0000} Synchronizing with Follower sid: 1566815238 maxCommittedLog=0x3 minCommittedLog=0x9000002d9 peerLastZxid=0x9000004ca{color}*
*2019-08-27 04:26:10,747 [myid:1566815240] - INFO [LearnerHandler-/172.28.0.2:59666:LearnerHandler@410][] - leader and follower are in sync, zxid=0x9000004ca*
*2019-08-27 04:26:10,748 [myid:1566815240] - INFO [LearnerHandler-/172.28.0.2:59666:LearnerHandler@475][] - Sending DIFF*
*2019-08-27 04:26:10,811 [myid:1566815240] - INFO [SessionTracker:SessionTrackerImpl@163][] - SessionTrackerImpl exited loop!*
*2019-08-27 04:26:10,833 [myid:1566815240] - INFO [LearnerHandler-/172.28.0.2:59666:LearnerHandler@535][] - Received NEWLEADER-ACK message from 1566815238*
*2019-08-27 04:26:10,833 [myid:1566815240] - INFO [QuorumPeer[myid=1566815240]/172.28.8.123:9880:Leader@964][] - Have quorum of supporters, sids: [ 1566815238,1566815240 ]; starting up and setting last processed zxid: 0xa00000000*
*2019-08-27 04:26:11,160 [myid:1566815240] - INFO [SyncThread:1566815240:FileTxnLog@216][] - Creating new log file: log.a00000001*

{color:#FF0000}maxCommittedLog=0x3 minCommittedLog=0x9000002d9 peerLastZxid=0x9000004ca{color}

{color:#ff0000}*why maxCommittedLog < minCommittedLog?*{color}

2.Node 2(follower) get a Trunc message form leader.The leader zxid of the Trunc message is 0x3. So Node3 truncat the  transaction log(the zxid which is bigger than 0x3 will be deleted). At last, the data in Node2 is inconsistent.

 

*2019-08-27 04:26:14,225 [myid:1566815239] - INFO [WorkerReceiver[myid=1566815239]:FastLeaderElection@595][] - Notification: 1 (message format version), 1566815240 (n.leader), 0x9000004ca (n.zxid), 0x1 (n.round), FOLLOWING (n.state), 1566815238 (n.sid), 0xa (n.peerEpoch) LOOKING (my state)*
*2019-08-27 04:26:14,226 [myid:1566815239] - INFO [WorkerReceiver[myid=1566815239]:FastLeaderElection@595][] - Notification: 1 (message format version), 1566815240 (n.leader), 0x9000004ca (n.zxid), 0x1 (n.round), FOLLOWING (n.state), 1566815238 (n.sid), 0xa (n.peerEpoch) FOLLOWING (my state)*
*2019-08-27 04:26:14,226 [myid:1566815239] - INFO [QuorumPeer[myid=1566815239]/172.28.8.122:9880:QuorumPeer@980][] - FOLLOWING*
*2019-08-27 04:26:14,226 [myid:1566815239] - INFO [Thread-1:QuorumPeer$1@936][] - Interrupted while attempting to start ReadOnlyZooKeeperServer, not started*
*2019-08-27 04:26:14,229 [myid:1566815239] - INFO [QuorumPeer[myid=1566815239]/172.28.8.122:9880:Learner@86][] - TCP NoDelay set to: true*
*2019-08-27 04:26:14,229 [myid:1566815239] - INFO [QuorumPeer[myid=1566815239]/172.28.8.122:9880:ZooKeeperServer@174][] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 300000 datadir /opt/fusionplatform/data/zookeeper/data/version-2 snapdir /opt/fusionplatform/data/zookeeper/data/version-2*
*2019-08-27 04:26:14,230 [myid:1566815239] - INFO [QuorumPeer[myid=1566815239]/172.28.8.122:9880:Follower@65][] - {color:#FF0000}FOLLOWING - LEADER ELECTION TOOK - 36{color}*
*2019-08-27 04:26:14,232 [myid:1566815239] - INFO [QuorumPeer[myid=1566815239]/172.28.8.122:9880:QuorumPeer$QuorumServer@185][] - Resolved hostname: 172.28.8.123 to address: /172.28.8.123*
*2019-08-27 04:26:14,346 [myid:1566815239] - WARN [QuorumPeer[myid=1566815239]/172.28.8.122:9880:Learner@349][] - {color:#FF0000}Truncating log to get in sync with the leader 0x3{color}*
*2019-08-27 04:26:14,371 [myid:1566815239] - INFO [QuorumPeer[myid=1566815239]/172.28.8.122:9880:DataTree@715][] - type: create, sessionid:0x10000080a040001 cxid:0x4 zxid:0x3 reqpath:/cps*
*2019-08-27 04:26:14,374 [myid:1566815239] - WARN [QuorumPeer[myid=1566815239]/172.28.8.122:9880:Learner@387][] - Got zxid 0xa00000001 expected 0x1*

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 2 days ago 0|z066dc:
ZooKeeper ZOOKEEPER-3525

Add project status badges to README

Improvement Resolved Trivial Fixed Unassigned John Tran John Tran 28/Aug/19 23:46   29/Aug/19 06:16 29/Aug/19 01:16   3.6.0     0 2 0 1200   Other projects like [Spark|https://github.com/apache/spark] and [Hive|https://github.com/apache/hive] have status badges on their READMEs. I'd like to start contributing to ZooKeeper so I think this is a simple first task.


100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
29 weeks ago 0|z0652w:
ZooKeeper ZOOKEEPER-3524

LearnerMetricsTest.testLearnerMetricsTest is flaky again

Bug Resolved Major Duplicate Mate Szalay-Beko Mate Szalay-Beko Mate Szalay-Beko 28/Aug/19 10:08   20/Sep/19 05:13 20/Sep/19 05:13         0 1   We had an earlier fix in ZOOKEEPER-3470 for this test, but it looks failing again.

I haven't found any failures on the [zookeeper trunk job|https://builds.apache.org/view/ZK%20All/job/ZooKeeper-trunk], but it does fail from time to time on the [precommit maven jobs |https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/org.apache.zookeeper$zookeeper/1171/testReport/junit/org.apache.zookeeper.server.quorum/LearnerMetricsTest/testLearnerMetricsTest/] or on [zookeeper-master-maven|https://builds.apache.org/view/ZK%20All/job/zookeeper-master-maven/lastCompletedBuild/org.apache.zookeeper$zookeeper/testReport/org.apache.zookeeper.server.quorum/LearnerMetricsTest/testLearnerMetricsTest/].

Can this be maven related somehow? would be strange...
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 6 days ago 0|z064go:
ZooKeeper ZOOKEEPER-3523

Replace dummy watcher with a unified singleton

Improvement Resolved Major Fixed Zili Chen Zili Chen Zili Chen 27/Aug/19 12:49   18/Sep/19 08:28 18/Sep/19 05:29 3.6.0 3.6.0 server, tests   0 3 0 7200   Currently we have many anonymous subclasses of {{Watch}} all of which is just event -> { }. We can reduce many subclasses in inherit tree by add a {{DUMMY_WATCHER}} in {{Watcher}} interface and replace all of those usages. 100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
26 weeks, 1 day ago 0|z0635s:
ZooKeeper ZOOKEEPER-3522

Consistency guarantees discussion.

Improvement Resolved Minor Fixed Karolos Antoniadis Karolos Antoniadis Karolos Antoniadis 26/Aug/19 18:28   05/Sep/19 20:54 28/Aug/19 17:03   3.6.0     0 3   It seems there is some confusion on the exact consistency guarantees of ZooKeeper.

Goal of this issue is to add a few paragraphs on the subject in [ZooKeeper internals|[https://zookeeper.apache.org/doc/r3.5.5/zookeeperInternals.html]].
pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 6 days ago 0|z0626o:
ZooKeeper ZOOKEEPER-3521

equals generate by jute potentially cause NPE

Bug In Progress Critical Unresolved Zili Chen Zili Chen Zili Chen 24/Aug/19 22:51   14/Dec/19 06:09   3.6.0 3.7.0 jute   0 1   jute generate {{equals}} as follow


{code:java}
String genJavaEquals(String fname, String peer) {
return " ret = " + fname + ".equals(" + peer + ");\n";
}
{code}

if {{fname}} is null at the runtime, then a {{NullPointerException}} would be thrown, see [this report|https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/org.apache.zookeeper$zookeeper/1167/testReport/junit/org.apache.zookeeper.server/PrepRequestProcessorTest/testPRequest/] for instance.

Java already solved this problem by using {{java.util.Objetcs.equals}}, I address this issue along with ZOOKEEPER-3290 in GH-839. But I need input from CPP and CSharp side.

BTW, is there anybody use jute's CSharp version or even CPP version?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 4 days ago 0|z060bs:
ZooKeeper ZOOKEEPER-3520

KeeperException is good to have a error message field besides path

Bug Open Major Unresolved Unassigned Zili Chen Zili Chen 24/Aug/19 22:43   24/Aug/19 22:43       server   0 2   We trickily set a diagnosis msg as {{path}} of {{KeeperException}} in {{CreateMode}}(see code snippet below) because {{KeeperException}} doesn't have a dedicated error message field. It is good to have a {{diagnosis}} field beside {{path}} for error message beyond a path.


{code:java}
// CreateMode#L136
String errMsg = "Received an invalid flag value: " + flag + " to convert to a CreateMode";
LOG.error(errMsg);
throw new KeeperException.BadArgumentsException(errMsg);
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 4 days ago 0|z060bk:
ZooKeeper ZOOKEEPER-3519

upgrade dependency-check to 5.2.1

Improvement Closed Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Aug/19 22:02   16/Oct/19 14:58 28/Aug/19 09:33 3.6.0, 3.5.5, 3.4.14 3.6.0, 3.4.15, 3.5.6 build-infrastructure   0 2 0 2400   Minor upgrade to dependency-check - upgrade to latest 5.2.1 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks ago 0|z060bc:
ZooKeeper ZOOKEEPER-3518

owasp check flagging jackson-databind 2.9.9.1

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Aug/19 21:50   16/Oct/19 14:58 25/Aug/19 15:34 3.6.0, 3.5.5 3.6.0, 3.5.6 security   0 1 0 1800   owasp check is flagging jackson-databind 2.9.9.1 - upgrade to 2.9.9.3

CVE-2019-14379, CVE-2019-14439, CVE-2019-12384
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 3 days ago 0|z060b4:
ZooKeeper ZOOKEEPER-3517

ZOOKEEPER-3431 Turn on BookKeeper checkstyle configuration at project level

Sub-task Resolved Major Fixed Zili Chen Zili Chen Zili Chen 23/Aug/19 19:25   29/Aug/19 06:16 27/Aug/19 03:23   3.6.0 build   0 2 0 5400   while still use the simple for {{zookeeper-contrib}} because it is not worth to be applied. 100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks ago 0|z05zww:
ZooKeeper ZOOKEEPER-3516

Zookeeper not working with enabling ssl and remote authentication

Bug Open Major Unresolved Unassigned Rohit Singh Rohit Singh 23/Aug/19 16:43   23/Aug/19 16:45   3.4.8   jmx   0 1    
{code:java}
-Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.port=9992 -Dcom.sun.management.jmxremote.rmi.port=9993 -Dcom.sun.management.jmxremote.password.file=/zookeeper/zookeeper-3.4.8/conf/jmxremote-password -Dcom.sun.management.jmxremote.access.file=/zookeeper/zookeeper-3.4.8/conf/jmxremote-access -Dcom.sun.management.jmxremote.ssl=true -Djavax.net.ssl.keyStore=/opt/zookeeper/certificate.ks -Djavax.net.ssl.keyStorePassword=YmM1NTkwZTVlZDg0 -Djavax.net.ssl.trustStore=/opt/zookeeper/serviceCA.ts -Djavax.net.ssl.trustStorePassword=YmM1NTkwZTVlZDg0 -Dcom.sun.management.jmxremote.registry.ssl=true -Dzookeeper.jmx.log4j.disable= -Djava.rmi.server.hostname=<hostname> org.apache.zookeeper.server.quorum.QuorumPeerMain
{code}
When zookeeper is brought with above options following error is seen
{code:java}
Error: Exception thrown by the agent : java.lang.IllegalArgumentException: Expected word at end of line [readwrite ]
{code}
However when Dcom.sun.management.jmxremote.authenticate=false is set to false then zookeeper  starts without any errors, but remote authentication is disabled and ssl works.
{code:java}
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=9992 -Dcom.sun.management.jmxremote.rmi.port=9993 -Dcom.sun.management.jmxremote.password.file=/zookeeper/zookeeper-3.4.8/conf/jmxremote-password -Dcom.sun.management.jmxremote.access.file=/zookeeper/zookeeper-3.4.8/conf/jmxremote-access -Dcom.sun.management.jmxremote.ssl=true -Djavax.net.ssl.keyStore=/opt/zookeeper/certificate.ks -Djavax.net.ssl.keyStorePassword=YzJhZjIxN2Q2ODQ4 -Djavax.net.ssl.trustStore=/opt/zookeeper/serviceCA.ts -Djavax.net.ssl.trustStorePassword=YzJhZjIxN2Q2ODQ4 -Dcom.sun.management.jmxremote.registry.ssl=true -Dzookeeper.jmx.log4j.disable= -Djava.rmi.server.hostname=<hostname> org.apache.zookeeper.server.quorum.QuorumPeerMain
{code}
Is this behavior expected. 

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 6 days ago 0|z05zso:
ZooKeeper ZOOKEEPER-3515

server.xml howing keystore and truststore passwords as clear text

Bug Open Critical Unresolved Unassigned Holger Herbert Holger Herbert 23/Aug/19 12:13   01/Jan/20 22:41   3.5.5 3.7.0 server   7 3   all platforms values for keystorepass and truststtorePass are stored clear in server.XML 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Important
11 weeks ago 0|z05zio:
ZooKeeper ZOOKEEPER-3514

Use client certificate SAN list for X.509 ACL AuthZ

Improvement Open Major Unresolved Unassigned Jon Bringhurst Jon Bringhurst 22/Aug/19 15:29   26/Aug/19 15:44           0 3   Hello! We have a TLS environment where services currently utilize various client certificate SAN fields for authentication. For example, a client certificate would contain something like this:

{noformat}
X509v3 Subject Alternative Name: critical
DNS:zookeeper-server-001.example.com, DNS:APPLICATION_NAME, DNS:DATACENTER_NAME
{noformat}

My current approach is to simply add the SAN list to the cnxn AuthInfo list. For example (in X509AuthenticationProvider):

{noformat}
protected List<String> getAlternativeClientIds(X509Certificate clientCert) {
// not shown: filtering on type 2 here
return clientCert.getSubjectAlternativeNames();
}
{noformat}

{noformat}
if (this.sslAclIncludeSANAuthZEnabled) {
List<String> alternativeClientIds = getAlternativeClientIds(clientCert);
for (int i = 0; i < alternativeClientIds.size(); i++) {
Id altAuthInfo = new Id(getScheme(), alternativeClientIds.get(i));
cnxn.addAuthInfo(altAuthInfo);

LOG.info("Authenticated Alternative Id '{}' for Scheme '{}'", altAuthInfo.getId(), altAuthInfo.getScheme());
}
}
{noformat}

So, ACLs would then look something like this (given the example SAN list shown above):

{noformat}
x509:zookeeper-server-001.example.com:perm
x509:APPLICATION_NAME:perm
x509:DATACENTER_NAME:perm
{noformat}

Before I spend time to put it together, would a patch for this functionality have any chance of being accepted (any suggestions for alternative approaches)? If so, how do you feel about the config option named sslAclIncludeSANAuthZEnabled?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 3 days ago 0|z05y0o:
ZooKeeper ZOOKEEPER-3513

Zookeeper upgrade fails due to missing snapshots

Bug Resolved Major Duplicate Unassigned Stephan Huttenhuis Stephan Huttenhuis 21/Aug/19 04:18   04/Dec/19 09:16 04/Dec/19 09:16 3.5.4, 3.6.0   server   0 4   In ZOOKEEPER-2325 a check was added that requires a snapshot when loading data. We have been running 3-node ensembles on Zookeeper 3.4.13 for about 5 months for use with Solr Cloud. During this time some ensembles created a few snapshots but other didn't generate any. Because of this upgrading to e.g. 3.5.5 fails.

Either it is perfectly possible for Zookeeper data to have no snapshots or something is going wrong with generating snapshots. The ensembles are straightforward.
- The following stack occurs:
{noformat}
java.io.IOException: No snapshot found, but there are log entries. Something is broken!
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
{noformat}

- The zoo.cfg
{noformat}
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/data/zookeeper/data
# the port at which the clients will connect
clientPort=2181

server.1=myserver1:2888:3888
server.2=myserver2:2888:3888
server.3=myserver3:2888:3888
{noformat}
 

- The contents of /data/zookeeper/data/version-2
{noformat}
-rw-r--r-- 1 zookeeper zookeeper    1 Aug  7 21:50 acceptedEpoch
-rw-r--r-- 1 zookeeper zookeeper    1 Aug  8 20:38 currentEpoch
-rw-r--r-- 1 zookeeper zookeeper  65M Apr  1 14:44 log.1
-rw-r--r-- 1 zookeeper zookeeper  65M May 15 23:30 log.100000001
-rw-r--r-- 1 zookeeper zookeeper  65M Jul  3 23:21 log.100001645
-rw-r--r-- 1 zookeeper zookeeper  65M Aug  8 20:37 log.300000802
-rw-r--r-- 1 zookeeper zookeeper  65M Aug 20 13:58 log.70000062a
-rw-r--r-- 1 zookeeper zookeeper  65M Apr  4 21:22 log.f0
{noformat}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
15 weeks, 1 day ago 0|z05vfc:
ZooKeeper ZOOKEEPER-3512

ZOOKEEPER-3114 Real time data integrity check during broadcast time

Sub-task Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 20/Aug/19 11:35   07/Jan/20 02:22 27/Dec/19 16:35   3.6.0 server   0 2 0 27600   This is a sub task of ZOOKEEPER-3114, which is going to calculate the hash value of data tree after each txn, leader will proposal txn with digest, and when learner applied the txn to data tree, it will check if it has the same hash value or not.

Currently, the default behavior is logging and reporting via metric, people can also implement auto-recover based on the event received from DigestWatcher. 
100% 100% 27600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks, 6 days ago 0|z05uco:
ZooKeeper ZOOKEEPER-3511

Remove 'membership:' string from output of 'conf' 4lw

Task Open Major Unresolved Unassigned Andor Molnar Andor Molnar 15/Aug/19 11:32   02/Sep/19 07:21   3.6.0, 3.5.5   server   0 2   It causes problems when the user is trying to automatically parse the output. Other than this redundant string, the output conforms properties format.

Reported by Solr folks.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
31 weeks ago 0|z05pdc:
ZooKeeper ZOOKEEPER-3510

Frequent 'zkServer.sh stop' failures when running C test suite

Bug Closed Minor Fixed Unassigned Damien Diederen Damien Diederen 15/Aug/19 04:18   16/Oct/19 14:59 23/Aug/19 06:16   3.6.0, 3.5.6     0 2 0 3600   As mentioned in https://github.com/apache/zookeeper/pull/1054#discussion_r314208678 :

There is a {{sleep 3}} statement in {{zkServer.sh restart}}. I am unable to unearth the history of that particular line, but I believe part—if not all—of that {{sleep}} should be part of {{zkServer.sh stop}}.

I frequently observe {{FAILED TO START}} errors in the C test suite; the logs consistently show that those are caused by {{java.net.BindException: Address already in use}}. Adding a simple {{sleep 1}} before {{echo STOPPED}} "fixes" it for me. I will submit an initial PR with the corresponding change and a commit message akin to:

----

ZOOKEEPER-XXXX: Make zkServer.sh stop more reliable

Kill is asynchronous, and without the sleep, the server's TCP port can still be busy when the next server is started—causing flaky runs of the C client's test suite.

(It would probably be better to spin a few times, probing with ps -p.)

----

As noted above, the sleep is far from optimal, an adaptive mechanism would be better—but I do not want to make the first iteration too complicated.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 6 days ago 0|z05owo:
ZooKeeper ZOOKEEPER-3509

Revisit log format

Improvement Resolved Major Fixed Zili Chen Zili Chen Zili Chen 14/Aug/19 10:23   10/Oct/19 13:00 10/Oct/19 08:38   3.6.0 server   0 2 0 14400   Currently ZooKeeper mixes up different log format and even a number of log statements are buggy. It is an opportunity that we revisit log format in ZooKeeper and do a pass to fix all log format related issues. 100% 100% 14400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
23 weeks ago 0|z05nxk:
ZooKeeper ZOOKEEPER-3508

Strategy on line break

Improvement Open Major Unresolved Unassigned Zili Chen Zili Chen 14/Aug/19 10:19   14/Aug/19 10:21           0 1   While enabling checkstyle configuration on zookeeper-server module, it raises the discussion that how we generally break/wrap long lines. Whether or not we introduce a standard and if so, how we ensure it is obeyed. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
31 weeks, 1 day ago 0|z05nxc:
ZooKeeper ZOOKEEPER-3507

Revisit name patterns

Improvement Open Major Unresolved Unassigned Zili Chen Zili Chen 14/Aug/19 10:16   14/Aug/19 10:16           0 1   While enabling checkstyle configuration on zookeeper-server, it raises the discussion that how we treat variable/type names out of suggested pattern. Backward compatibility and a consistent view should be taken into consideration. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
31 weeks, 1 day ago 0|z05nx4:
ZooKeeper ZOOKEEPER-3506

correct the SessionTrackerImpl#initializeNextSession's javaDoc about how to generate the sessionId

Improvement Resolved Minor Fixed maoling maoling maoling 14/Aug/19 04:47   20/Sep/19 14:20 20/Sep/19 04:03   3.6.0 documentation, server   0 2 0 2400    
{code:java}
/**
* Generates an initial sessionId. High order byte is serverId, next 5
* 5 bytes are from timestamp, and low order 2 bytes are 0s.
*/
public static long initializeNextSession(long id) {
long nextSid;
nextSid = (Time.currentElapsedTime() << 24) >>> 8;
nextSid = nextSid | (id <<56);
if (nextSid == EphemeralType.CONTAINER_EPHEMERAL_OWNER) {
++nextSid; // this is an unlikely edge case, but check it just in case
}
return nextSid;
}
{code}
 

 
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 6 days ago 0|z05ng0:
ZooKeeper ZOOKEEPER-3505

Add a shell script to make SnapshotComparer tool more user-friendly

Improvement Resolved Major Fixed Maya Wang Maya Wang Maya Wang 13/Aug/19 18:38   01/Mar/20 21:12 01/Mar/20 21:12         0 1   Ref: [https://github.com/apache/zookeeper/pull/984/]

SnapshoComparer is a tool that assists debugging with snapshots(ZOOKEEPER-3427). We want to provide a shell script like [https://github.com/apache/zookeeper/blob/master/bin/zkTxnLogToolkit.sh] to make SnapshotComparer tool more user-friendly. 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 3 days ago ZOOKEEPER-3427 0|z05myw:
ZooKeeper ZOOKEEPER-3504

An information leakage from FileTxnSnapLog to log:

Bug Open Major Unresolved Unassigned xiaoqin.fu xiaoqin.fu 12/Aug/19 22:34   13/Aug/19 04:50   3.4.11, 3.4.12, 3.4.13, 3.5.5, 3.4.14   security, server   0 2   In org.apache.zookeeper.server.persistence.FileTxnSnapLog, the statement LOG.debug don't have LOG controls:
public void processTransaction(TxnHeader hdr,DataTree dt,
Map<Long, Integer> sessions, Record txn)
throws KeeperException.NoNodeException {
......
if (rc.err != Code.OK.intValue()) {
LOG.debug("Ignoring processTxn failure hdr:" + hdr.getType()
+ ", error: " + rc.err + ", path: " + rc.path);
}
......
}

Sensitive information about hdr type or rc path was leaked. The conditional statement LOG.isDebugEnabled() should be added:
public void processTransaction(TxnHeader hdr,DataTree dt,
Map<Long, Integer> sessions, Record txn)
throws KeeperException.NoNodeException {
......
if (rc.err != Code.OK.intValue()) {
if (LOG.isDebugEnabled())
LOG.debug("Ignoring processTxn failure hdr:" + hdr.getType()
+ ", error: " + rc.err + ", path: " + rc.path);
}
......
}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
31 weeks, 2 days ago 0|z05leo:
ZooKeeper ZOOKEEPER-3503

Add server side large request throttling

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 10/Aug/19 12:32   16/Sep/19 20:17 16/Sep/19 17:26 3.6.0 3.6.0 server   0 2 0 6600   This task adds a new request limiting mechanism to ZooKeeper that aims to protect ZooKeeper from accepting too many large requests and crashing because it runs out of memory. This is designed to augment the connection throttling (ZOOKEEPER-3242) and request throttling (ZOOKEEPER-3243), which focus on limiting the number rather than size of requests. 100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
26 weeks, 3 days ago 0|z05jag:
ZooKeeper ZOOKEEPER-3502

improve the server command: zabstate to have a better observation on the process of leader election

Improvement Resolved Minor Fixed maoling maoling maoling 10/Aug/19 05:43   20/Nov/19 01:33 20/Nov/19 01:01   3.6.0 server   0 2 0 4200   100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 1 day ago 0|z05j2w:
ZooKeeper ZOOKEEPER-3501

unify the method:op2String()

Improvement Resolved Minor Fixed maoling maoling maoling 09/Aug/19 05:53   20/Sep/19 14:20 20/Sep/19 04:07   3.6.0 server   0 2 0 3000   there were two duplicated method

*public static String op2String(int op)*

in the code base:

 
{code:java}
org.apache.zookeeper.server.TraceFormatter#op2String
org.apache.zookeeper.server.Request#op2String
{code}
 

and they are inconsistency, we should unify it and remain only one

 
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 6 days ago 0|z05i20:
ZooKeeper ZOOKEEPER-3500

Improving the ZAB UPTODATE semantic to only issue it to learner when there is limited lagging

Improvement Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 08/Aug/19 13:12   10/Sep/19 21:08       server   0 1 0 7800   With large snapshot and high write RPS, when learner is having SNAP syncing with leader, there will be lots of txns need to be replayed between NEWLEADER and UPTODATE packet.
 
Depends how big the snapshot and traffic is, from our benchmark, it may take more than 30s to replay all those txns, which means when we process the UPTODATE packet, it's still 30s lagging behind, with 10K/s txn that's 300K txns lagging.
 
And we start to serve client traffic just after we received UPTODATE packet, which means client will see lots of stale data.
 
The idea here is trying to check and only send UPTODATE packet when there is limited txns lagging behind from leader side. It doesn't change the ZAB protocol, but changed the time when ZK is applying the txns between NEWLEADER and UPTODATE. 
 
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks ago 0|z05h60:
ZooKeeper ZOOKEEPER-3499

[admin server way] Add a complete backup mechanism for zookeeper internal

New Feature In Progress Major Unresolved maoling maoling maoling 08/Aug/19 02:43   14/Dec/19 06:08     3.7.0 server   0 1 0 600   100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks ago 0|z05gc8:
ZooKeeper ZOOKEEPER-3498

In zookeeper-jute project generated source should not be in target\classes folder

Bug Closed Major Fixed Zili Chen Mohammad Arshad Mohammad Arshad 07/Aug/19 07:49   16/Oct/19 14:59 08/Aug/19 11:05 3.5.5 3.6.0, 3.5.6 build   0 3 0 2400   Currently in zookeeper-jute project jute generated source code are put in target\classes folder. In eclipse when project is refreshed/cleaned this folder content will get deleted which results in compilation error in other projects
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks ago 0|z05eu0:
ZooKeeper ZOOKEEPER-3497

Make org.apache.zookeeper.jmx.MBeanRegistry a standard singleton

Improvement Open Minor Unresolved Unassigned jwhao jwhao 07/Aug/19 06:09   04/Mar/20 23:29   3.6.0   jmx   0 3 0 2400   {{org.apache.zookeeper.jmx.MBeanRegistry}} is an effective singleton while its constructor isn't private. We could make it a standard singleton.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks ago 0|z05epc:
ZooKeeper ZOOKEEPER-3496

Transaction larger than jute.maxbuffer makes ZooKeeper unavailable

Bug Closed Critical Fixed Mohammad Arshad Mohammad Arshad Mohammad Arshad 07/Aug/19 04:26   14/Feb/20 10:23 26/Sep/19 02:56 3.5.5, 3.4.14 3.6.0, 3.5.7     0 4 0 22800   *Problem:*
ZooKeeper server fails to start, logs following error
{code:java}
Exception in thread "main" java.io.IOException: Unreasonable length = 1001025
at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127)
at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92)
{code}
This indicates that one of the transactions size is more than the configured jute.maxbuffer values. But how transaction more than jute.maxbuffer size is allowed to write?


*Analysis:*
At ZooKeeper server jute.maxbuffer specifies the maximum size of a transaction. By default it is 1 MB at the server
jute.maxbuffer is used for following:
# Size sanity check of incoming request. Incoming requests size must not be more than jute.maxbuffer
# Size sanity check of the transaction while reading from transaction or snapshot file. Transaction size must not be more than jute.maxbuffer+1024
# Size sanity check of transaction while reading data from the leader. Transaction size must not be more than jute.maxbuffer+1024

Request size sanity check is done in the beginning of a request processing but later request processing adds additional information into request then writes to transaction file. This additional information size is not considered in sanity check. This is how transaction larger than jute.maxbuffer are accepted into ZooKeeper.

If this additional information size is less than 1024 Bytes then it is OK as ZooKeeper already takes care of it.
But if this additional information size is more than 1024 bytes it allows the request, But while reading from transaction/snapshot file and while reading from leader it fails and make the ZooKeeper service unavailable

+Example:+
Suppose incoming request size is 1000000 Bytes
Configured jute.maxbuffer is 1000000
After processing the request ZooKeeper server adds 1025 more bytes
In this case, request will be processed successfully, and 1000000+1025 bytes will be written to transaction file
But while reading from the transaction log 1000000+1025 bytes cannot be read as max allowed length is 1000000(effectively 1000000+1024).

*Solutions:*
If incoming request size sanity check is done after populating all additional information then this problem is solved. But doing sanity check in the later stage of request processing will defeat the purpose of sanity check itself. So this we can not do

Currently additional information size is constant 1024 Bytes [Code Reference|https://github.com/apache/zookeeper/blob/branch-3.5/zookeeper-jute/src/main/java/org/apache/jute/BinaryInputArchive.java#L126]. We should increase this value and make it more reasonable. I propose to make this additional information size to same as the jute.maxbuffer. Also make additional information size configurable.

100% 100% 22800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
25 weeks ago 0|z05ehc:
ZooKeeper ZOOKEEPER-3495

Broken test in JDK12+: SnapshotDigestTest.testDifferentDigestVersion

Test Resolved Minor Fixed Mate Szalay-Beko Andor Molnar Andor Molnar 06/Aug/19 11:50   10/Sep/19 14:09 10/Sep/19 09:22   3.6.0     0 3 0 21000   This test uses reflection to get access to "modifiers" field in Field class which is not supported any longer in Java 12+ versions. Please modify the test accordingly. 100% 100% 21000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 2 days ago 0|z05drc:
ZooKeeper ZOOKEEPER-3494

No need to depend on netty-all (SSL)

Improvement Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 05/Aug/19 07:45   21/Oct/19 03:14 05/Aug/19 19:56 3.6.0, 3.5.5 3.6.0, 3.5.6     1 3 0 7800   ZooKeeper currently depends on netty-all to satisfy the requirements for the SSL capability.
netty-all is an unnecessarily broad dependency, we already had request to revise (https://issues.apache.org/jira/browse/SOLR-13665)

netty-handler (429KB) and netty-transport-native-epoll (115KB) looks like is enough.
Netty-all was ~4MB.
This reduce size of netty dependency to 1/8.
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 2 days ago 0|z05by8:
ZooKeeper ZOOKEEPER-3493

Deflake testConcurrentRequestProcessingInCommitProcessor in CommitProcessorMetricsTest

Improvement Resolved Major Duplicate Unassigned Jie Huang Jie Huang 05/Aug/19 02:08   06/Aug/19 05:25 06/Aug/19 05:25 3.6.0       0 1 0 4200   100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 3 days ago 0|z05bmw:
ZooKeeper ZOOKEEPER-3492

Add weights to server side connection throttling

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 04/Aug/19 16:11   18/Sep/19 10:48 05/Sep/19 17:13   3.6.0 server   0 3 0 9000   In ZOOKEEPER-3242, we introduced connection throttling to protect the server from being overloaded. We realize that the costs for creating a local session, creating a global session, and reconnecting are different. So we associate weights to the costs when throttling. For example, for the same setting, the throttler will allow more connections to be created if they are local.  This allows the server resources to be fully utilized. 100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
28 weeks ago 0|z05bew:
ZooKeeper ZOOKEEPER-3491

Specify commitLogCount value using a system property

Improvement Resolved Major Fixed Vladimir Ivić Vladimir Ivić Vladimir Ivić 02/Aug/19 20:16   13/Sep/19 22:09 09/Sep/19 18:27 3.6.0 3.6.0 quorum, server   0 2 0 5400   Currently the commit log count value is set to 500. This can cause busy servers to snapshot transactions too often.

Override default commitLogCount=500 through the system property zookeeper.commitLogCount.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 2 days ago 0|z05au8:
ZooKeeper ZOOKEEPER-3490

Zookeeper followers not reflecting writes (after months)

Bug Open Major Unresolved Unassigned Matthew Hertz Matthew Hertz 02/Aug/19 10:51   02/Aug/19 10:52   3.4.13       0 1   Hi,

We have a 3 node Zookeeper cluster. There are a number of znode's on the leader that are not visible on the followers.
{code:java}
$ zkCli -server <server 1> (follower)
[zk: <server 1>(CONNECTED) 0] get /pyMkdProducer/SNAP/lock/c4a62c9fdfdc412fac3818bbb2af3a0f__lock__0000000040
abcd.company.com:<built-in function getpid>
cZxid = 0xf00061d68
ctime = Thu Nov 01 12:40:33 GMT 2018
mZxid = 0xf00061d68
mtime = Thu Nov 01 12:40:33 GMT 2018
pZxid = 0xf00061d68
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x500be5318d60407
dataLength = 58
numChildren = 0
{code}
{code:java}
$ zkCli -server <server 2> (leader)
[zk: <server2>(CONNECTED) 0] get /pyMkdProducer/SNAP/lock/c4a62c9fdfdc412fac3818bbb2af3a0f__lock__0000000040 Node does not exist: /pyMkdProducer/SNAP/lock/c4a62c9fdfdc412fac3818bbb2af3a0f__lock__0000000040
{code}
{code:java}
$ zkCli -server <server 3> (follower)
[zk: <server3>(CONNECTED) 0] get /pyMkdProducer/SNAP/lock/c4a62c9fdfdc412fac3818bbb2af3a0f__lock__0000000040
abcd.company.com:<built-in function getpid>
cZxid = 0xf00061d68
ctime = Thu Nov 01 12:40:33 GMT 2018
mZxid = 0xf00061d68
mtime = Thu Nov 01 12:40:33 GMT 2018
pZxid = 0xf00061d68
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x500be5318d60407
dataLength = 58
numChildren = 0
{code}
These nodes are ephemeral nodes. The sessions no longer exist. There are 6 znodes in this 'inconsistent' state. The cluster is currently connected - there are no networking partitions currently.

We're at a loss for how to both debug and fix this. Restarting the Zookeeper followers presumably will not help? Are all nodes ever force-synced from the leader?

Help would be appreciated. If any more information would be helpful it can be provided, however we will likely have to resolve this issue one way or another in the near future.

Thanks

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 6 days ago 0|z05ac8:
ZooKeeper ZOOKEEPER-3489

Possible information leakage to log without LOG configuration control LOG.isWarnEnabled()

Bug Open Major Unresolved Unassigned xiaoqin.fu xiaoqin.fu 02/Aug/19 07:10   05/Aug/19 23:19       java client, security   0 1   Ubuntu 16.04.3 LTS
Open JDK version "1.8.0_191" build 25.191-b12
In org.apache.zookeeper.ClientCnxn$SendThread, statements LOG.warn(....) don't have LOG configuration controls.
void readResponse(ByteBuffer incomingBuffer) throws IOException {
......
LOG.warn("Got server path " + event.getPath()
+ " which is too short for chroot path "
+ chrootPath);
......
}
Sensitive information about event path and chroot path may be leaked. The LOG.isWarnEnabled() conditional statement should be added:
void readResponse(ByteBuffer incomingBuffer) throws IOException {
......
if (LOG.isWarnEnabled())
LOG.warn("Got server path " + event.getPath()
+ " which is too short for chroot path "
+ chrootPath);
......
}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 2 days ago 0|z05a2g:
ZooKeeper ZOOKEEPER-3488

Possible information leakage to log without LOG configuration control LOG.isInfoEnabled()

Bug Open Major Unresolved Unassigned xiaoqin.fu xiaoqin.fu 02/Aug/19 07:06   13/Aug/19 04:06   3.4.11, 3.4.12, 3.4.13, 3.5.5, 3.4.14   security, server   0 1   Ubuntu 16.04.3 LTS
Open JDK version "1.8.0_191" build 25.191-b12
In org.apache.zookeeper.server.ZooKeeperServer, statements LOG.info(....) don't have LOG configuration controls.
public ZooKeeperServer(FileTxnSnapLog txnLogFactory, int tickTime,
int minSessionTimeout, int maxSessionTimeout, ZKDatabase zkDb) {
......
LOG.info("Created server with tickTime " + tickTime
+ " minSessionTimeout " + getMinSessionTimeout()
+ " maxSessionTimeout " + getMaxSessionTimeout()
+ " datadir " + txnLogFactory.getDataDir()
+ " snapdir " + txnLogFactory.getSnapDir());
......
}
public void finishSessionInit(ServerCnxn cnxn, boolean valid)
......
if (!valid) {
LOG.info("Invalid session 0x"
+ Long.toHexString(cnxn.getSessionId())
+ " for client "
+ cnxn.getRemoteSocketAddress()
+ ", probably expired");
cnxn.sendBuffer(ServerCnxnFactory.closeConn);
} else {
LOG.info("Established session 0x"
+ Long.toHexString(cnxn.getSessionId())
+ " with negotiated timeout " + cnxn.getSessionTimeout()
+ " for client "
+ cnxn.getRemoteSocketAddress());
cnxn.enableRecv();
}
......
}
Sensitive information about DataDir, SnapDir, SessionId and RemoteSocketAddress may be leaked. It is better to add LOG.isInfoEnabled() conditional statements:
public ZooKeeperServer(FileTxnSnapLog txnLogFactory, int tickTime,
int minSessionTimeout, int maxSessionTimeout, ZKDatabase zkDb) {
......
if (LOG.isInfoEnabled())
LOG.info("Created server with tickTime " + tickTime
+ " minSessionTimeout " + getMinSessionTimeout()
+ " maxSessionTimeout " + getMaxSessionTimeout()
+ " datadir " + txnLogFactory.getDataDir()
+ " snapdir " + txnLogFactory.getSnapDir());
......
}
public void finishSessionInit(ServerCnxn cnxn, boolean valid) {
......
if (!valid) {
if (LOG.isInfoEnabled())
LOG.info("Invalid session 0x"
+ Long.toHexString(cnxn.getSessionId())
+ " for client "
+ cnxn.getRemoteSocketAddress()
+ ", probably expired");
cnxn.sendBuffer(ServerCnxnFactory.closeConn);
} else {
if (LOG.isInfoEnabled())
LOG.info("Established session 0x"
+ Long.toHexString(cnxn.getSessionId())
+ " with negotiated timeout " + cnxn.getSessionTimeout()
+ " for client "
+ cnxn.getRemoteSocketAddress());
cnxn.enableRecv();
}
......
}
The LOG.isInfoEnabled() conditional statement already exists in org.apache.zookeeper.server.persistence.FileTxnLog:
public synchronized boolean append(TxnHeader hdr, Record txn) throws IOException {
{ ......
if(LOG.isInfoEnabled()){
LOG.info("Creating new log file: " + Util.makeLogName(hdr.getZxid()));
}
......
}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
31 weeks, 2 days ago 0|z05a28:
ZooKeeper ZOOKEEPER-3487

Execution the same conditional statement twice in ZooTrace

Improvement Open Minor Unresolved Unassigned xiaoqin.fu xiaoqin.fu 02/Aug/19 06:55   28/Oct/19 10:25   3.4.11, 3.4.12, 3.4.13, 3.5.5, 3.4.14   server   0 2   Ubuntu 16.04.3 LTS
Open JDK version "1.8.0_191" build 25.191-b12
public static void logTraceMessage(Logger log, long mask, String msg) {
if (isTraceEnabled(log, mask)) {
log.trace(msg);
}
}

static public void logQuorumPacket(Logger log, long mask,
char direction, QuorumPacket qp)
{
if (isTraceEnabled(log, mask)) {
logTraceMessage(log, mask, direction +
" " + LearnerHandler.packetToString(qp));
}
}

We should remove one of two "if (isTraceEnabled(log, mask))" conditional statements:

public static void logTraceMessage(Logger log, long mask, String msg) {
if (isTraceEnabled(log, mask)) {
log.trace(msg);
}
}
static public void logQuorumPacket(Logger log, long mask,
char direction, QuorumPacket qp)
{
logTraceMessage(log, mask, direction +
" " + LearnerHandler.packetToString(qp));
}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
20 weeks, 3 days ago 0|z05a1k:
ZooKeeper ZOOKEEPER-3486

add the doc about how to configure SSL/TLS for the admin server

Improvement In Progress Minor Unresolved maoling maoling maoling 02/Aug/19 01:43   07/Mar/20 00:11     3.7.0 documentation   0 1 0 1200   100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 6 days ago 0|z059mw:
ZooKeeper ZOOKEEPER-3485

Measure reconfiguration time

Improvement Open Minor Unresolved Karolos Antoniadis Karolos Antoniadis Karolos Antoniadis 02/Aug/19 00:00   02/Aug/19 00:11   3.5.5       0 3   This issue is created after some initial discussion in the _dev_ mailing list (subject "Leader election logging during reconfiguration").

 

There does not seem to be a good way to measure reconfiguration time in ZooKeeper. Additionally, reconfiguration time is mixed together with leader election time*.* For instance, during reconfiguration, ZooKeeper logs a  {{LEADER ELECTION TOOK}} message even though no leader election might takes place.
 
This can be reproduced by following these steps:
1) start a ZooKeeper cluster (e.g., 3 participants)
2) start a client that connects to some follower
3) perform a _reconfig_ operation that removes the leader from the cluster
 
After the reconfiguration takes place, we can see that the log files of the remaining participants contain a "_LEADER ELECTION TOOK_" message. For example, a line that contains
_2019-07-29 23:07:38,518 [myid:2] - INFO  [QuorumPeer[myid=2](plain=0.0.0.0:2792)(secure=disabled):Follower@75] - FOLLOWING - LEADER ELECTION TOOK - 57 MS_
 
However, no leader election took place, in the sense that no server went _LOOKING_ and then started voting and sending notifications to other participants as would be in a normal leader election. It seems, that before the _reconfig_ is committed, the participant that is going to be the next leader is already decided (see here: [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L865]).
 

*Goal* of this issue/improvement is to measure in a better and more accurate way the time it takes for a reconfiguration to complete, as well as, to clearly distinguish the measurement of reconfiguration versus leader election.


 
 
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 6 days ago 0|z059k0:
ZooKeeper ZOOKEEPER-3484

Improve the throughput by optimizing the synchronization around outstandingChanges

Improvement Resolved Major Fixed Yisong Yue Yisong Yue Yisong Yue 01/Aug/19 17:55   10/Sep/19 02:49 09/Sep/19 22:26   3.6.0     0 3 0 3600   The "processRequest(Request request)" function in FinalRequestProcessor.java synchronizes around `outstandingChanges` for all requests now. However, this synchronization is unnecessary for read requests, and skipping such synchronization for reads can improve the overall throughput of the request processor pipeline. 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 2 days ago 0|z059bc:
ZooKeeper ZOOKEEPER-3483

Flaky test: org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testCollectStats

Test Open Minor Unresolved Michael Han Michael Han Michael Han 01/Aug/19 16:45   01/Aug/19 16:45   3.6.0   tests   0 1   Test org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testCollectStats consistently pass on local dev environment but frequently failing on Jenkins pre-commit build.

For now disable the test to unblock a couple of pull request acquiring a green build, before it's completely addressed.

Error for reference:

{code:java}
Error Message
expected:<845466> but was:<1100001>
Stacktrace
java.lang.AssertionError: expected:<845466> but was:<1100001>
at org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testCollectStats(RequestPathMetricsCollectorTest.java:248)
{code}
flaky-test 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks ago 0|z0598w:
ZooKeeper ZOOKEEPER-3482

SASL (Kerberos) Authentication with SSL for clients and Quorum

Improvement Closed Major Fixed Mate Szalay-Beko Jörn Franke Jörn Franke 01/Aug/19 14:30   14/Feb/20 10:23 22/Jan/20 05:10 3.5.5 3.6.0, 3.5.7, 3.7.0 server   2 5 0 7200   It seems that Kerberos authentication does not work for encrypted connections of clients and quorum. It seems that only X509 Authentication works.

What I would have expected:

ClientSecurePort is defined

A keystore and truststore are deployed on the ZooKeeper servers

Only a truststore is deployed with the client (to validate the CA of the server certificate)

Client can authenticate with SASL (Kerberos)

Similarly, it should work for the Quorum SSL connection.

Is there a way to configure this in ZooKeeper?

 

Note: Kerberos Authentication for SSL encrypted connection should be used instead of X509 authentication for this case and not in addition. However, if it only works in 3.5.5 in addition then I would be interested and willing to test it.
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
8 weeks, 1 day ago 0|z0592w:
ZooKeeper ZOOKEEPER-3481

The problem of AcceptedEpoch

Improvement Open Major Unresolved Unassigned tom.long tom.long 31/Jul/19 04:37   05/Feb/20 07:17   3.5.5 3.5.8 quorum   0 3   If the leader has been elected when the voting participant joins the cluster, then it can only act as followers. When Leader.getEpochToPropose is called, it does not participate in the voting. However, if the AcceptedEpoch is larger than the leader, it will never work properly.The status is as follows:LOOKING -> FOLLOWING -> exception -> LOOKING.

code as follows(Learner.registerWithLeader(int pktType)):
{code:java}
if (newEpoch > self.getAcceptedEpoch()){
wrappedEpochBytes.putInt((int)self.getCurrentEpoch());
self.setAcceptedEpoch(newEpoch);
}else if (newEpoch == self.getAcceptedEpoch()){
// since we have already acked an epoch equal to the leaders, we cannot ack
// again, but we still need to send our lastZxid to the leader so that we can
// sync with it if it does assume leadership of the epoch.
// the -1 indicates that this reply should not count as an ack for the new epoch
wrappedEpochBytes.putInt(-1);
}else{
throw new IOException("Leaders epoch, " + newEpoch + " is less than accepted epoch, " + self.getAcceptedEpoch());
}
{code}
 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 4 days ago 0|z056uw:
ZooKeeper ZOOKEEPER-3480

Flaky test CommitProcessorMetricsTest.testConcurrentRequestProcessingInCommitProcessor

Test Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 31/Jul/19 02:46   06/Aug/19 05:25 05/Aug/19 19:49 3.6.0 3.6.0 tests   0 3 0 7200   Found this flaky test on Jenkins, here is the log:

Error Message

expected:<3> but was:<2>
h3. Stacktrace

java.lang.AssertionError: expected:<3> but was:<2> at org.apache.zookeeper.server.quorum.CommitProcessorMetricsTest.testConcurrentRequestProcessingInCommitProcessor(CommitProcessorMetricsTest.java:391)
h3. Standard Output

2019-07-31 03:15:17,408 [myid:] - INFO [main:ZKTestCase$1@60] - STARTING testConcurrentRequestProcessingInCommitProcessor 2019-07-31 03:15:17,408 [myid:] - INFO [main:CommitProcessorMetricsTest@52] - setup 2019-07-31 03:15:17,409 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - RUNNING TEST METHOD testConcurrentRequestProcessingInCommitProcessor 2019-07-31 03:15:17,411 [myid:] - INFO [main:CommitProcessor@496] - Configuring CommitProcessor with readBatchSize -1 commitBatchSize 1 2019-07-31 03:15:17,411 [myid:] - INFO [main:CommitProcessor@454] - Configuring CommitProcessor with 24 worker threads. 2019-07-31 03:15:17,461 [myid:] - INFO [main:CommitProcessorMetricsTest$TestCommitProcessor@109] - numWorkerThreads in Test is 3 2019-07-31 03:15:19,466 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@99] - TEST METHOD FAILED testConcurrentRequestProcessingInCommitProcessor java.lang.AssertionError: expected:<3> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.zookeeper.server.quorum.CommitProcessorMetricsTest.testConcurrentRequestProcessingInCommitProcessor(CommitProcessorMetricsTest.java:391) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:80) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 2019-07-31 03:15:19,472 [myid:] - INFO [main:CommitProcessorMetricsTest@68] - tearDown starting 2019-07-31 03:15:19,473 [myid:] - INFO [main:CommitProcessor@646] - Shutting down 2019-07-31 03:15:20,464 [myid:] - INFO [CommitProcessor:1:CommitProcessor@419] - CommitProcessor exited loop! 2019-07-31 03:15:20,465 [myid:] - INFO [main:ZKTestCase$1@75] - FAILED testConcurrentRequestProcessingInCommitProcessor java.lang.AssertionError: expected:<3> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.zookeeper.server.quorum.CommitProcessorMetricsTest.testConcurrentRequestProcessingInCommitProcessor(CommitProcessorMetricsTest.java:391) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:80) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 2019-07-31 03:15:20,467 [myid:] - INFO [main:ZKTestCase$1@65] - FINISHED testConcurrentRequestProcessingInCommitProcessor

 
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 2 days ago 0|z056pc:
ZooKeeper ZOOKEEPER-3479

Logging false leader election times

Bug Resolved Minor Fixed Karolos Antoniadis Karolos Antoniadis Karolos Antoniadis 30/Jul/19 21:23   02/Aug/19 19:38 02/Aug/19 14:09 3.5.5 3.6.0 leaderElection   0 3 0 11400   There seems to be a problem with the logging of leader election times: the logged times are much smaller than the actual time it took for the leader election to complete.

This bug can be easily reproduced by following these steps:

1) Run a ZK cluster of 3 servers

2) Kill the server that is currently the leader

3) The log files of the remaining 2 servers contain false leader election times

 

In the attached files you can see the log files of the remaining 2 serve. For brevity, I removed the parts before and after the leader election from the log files.

For example, in {{server1.txt}} we can see that:

 
{code:java}
2019-07-31 00:57:31,852 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1318] - PeerState set to LOOKING
2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1193] - LOOKING
2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):FastLeaderElection@885] - New election. My id = 1, proposed zxid=0x100000001
[...]
2019-07-31 00:57:32,272 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):Follower@69] - FOLLOWING - LEADER ELECTION TOOK - 1 MS{code}
Leader election supposedly took only 1ms, but in reality it took (32,272 - 31,853) = 419ms!

The reason for this bug seems to be the introduction of this line
{code:java}
start_fle = Time.currentElapsedTime();{code}
(seen here [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402]) 

back in this commit [https://github.com/apache/zookeeper/commit/5428cd4bc963c2e653a260c458a8a8edf3fa08ef].

 

 

 

 
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
32 weeks, 6 days ago 0|z056g8:
ZooKeeper ZOOKEEPER-3478

Leader restart shuts down all the followers

Bug Open Major Unresolved Karolos Antoniadis Lara Catipovic Lara Catipovic 30/Jul/19 07:56   04/Sep/19 14:14   3.4.10       0 3   Hello ZooKeeper Community,

Could you please help me with at least clarifying a few doubts related to ZooKeeper 3.4.10?
We have 2 servers in our system, one with 2 Zookeeper servers and the one with 3 - meaning that in case of failure of the server with 3 Zookeeper servers, the quorum cannot be achieved.

*Server 11*
Zookeeper server 10
Zookeeper server 11
Zookeeper server 12

*Server 12*
Zookeeper server 20
Zookeeper server 21 -> Leader at the beginning of the procedure

As we were changing something in the configuration, it was needed to restart our servers, and to keep the quorum up, we restarted servers one by one (first on the one with 3 servers and then the other with 2 servers).
During the restart of the one with 3 servers, the quorum was not lost - since we restarted one by one.
Then we tried to restart the servers on the other one where we have 2 Servers deployed, one by one also.
The restart was executed in a small amount of time. After we restarted the first server 20 (follower) it joined the quorum with no errors, as expected.
*After we restarted the Leader server (21), all followers started to shut down!*

We had the same log on all the followers, but here is the example from the follower 20:
{panel}
Jun 27 14:49:31 [myid: 20]: WARN Connection broken for id 21, my id = 20, error =
Jun 27 14:49:31 javaOFException
Jun 27 14:49:31 at java.io.DataInputStream.readInt(Unknown Source)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1013)
Jun 27 14:49:31 [myid: 20]: INFO Accepted socket connection from /192.168.1.116:18532
Jun 27 14:49:31 [myid: 20]: WARN Exception when following the leader
Jun 27 14:49:31 OFException
Jun 27 14:49:31 at java.io.DataInputStream.readInt(Unknown Source)
Jun 27 14:49:31 at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
Jun 27 14:49:31 at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
Jun 27 14:49:31 [myid: 20]: WARN Connection request from old client /192.168.1.116:18532; will be dropped if server is in r-o mode
Jun 27 14:49:31 [myid: 20]: INFO Notification: 1 (message format version), 12 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) FOLLOWING (my state)
Jun 27 14:49:31 [myid: 20]: WARN Interrupting SendWorker
Jun 27 14:49:31 [myid: 20]: INFO Client attempting to renew session 0xa6b9dc92aa60200 at /192.168.1.116:18532
Jun 27 14:49:31 [myid: 20]: INFO shutdown called
Jun 27 14:49:31 java.lang.Exception: shutdown Follower
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
Jun 27 14:49:31 [myid: 20]: INFO Revalidating client: 0xa6b9dc92aa60200
Jun 27 14:49:31 [myid: 20]: WARN Interrupted while waiting for message on queue
Jun 27 14:49:31 java.InterruptedException
Jun 27 14:49:31 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
Jun 27 14:49:31 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
Jun 27 14:49:31 at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1097)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:932)
{panel}
*Is it expected that Leader in case of its restart triggers shut down of all its followers?*
This seems to me as an unexpected behavior, but maybe I'm wrong.

 

After this step and the servers are up again, 20 tries to become a Leader and server 21 accepts it and tries to follow the new Leader.
20 received ACK messages from itself and from 21.
There are also notifications sent about a new Leader to all other Zookeeper servers:
{panel}
Jun 27 14:49:31 [myid: 20]: INFO LEADING
Jun 27 14:49:31 [myid: 20]: INFO Created server with tickTime 1500 minSessionTimeout 3000 maxSessionTimeout 30000 datadir /local/cudb/BCServer/version-2 snapdir /local/cudb/BCServer/version-2
Jun 27 14:49:31 [myid: 20]: INFO LEADING - LEADER ELECTION TOOK - 213
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 10 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 11 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x1 (n.round), LOOKING (n.state), 21 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 21 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Follower sid: 21 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466717f0
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 11 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 10 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 11 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader), 0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 10 (n.sid), 0x66 (n.peerEpoch) LEADING (my state)
{panel}
 

From the servers 10, 11, 12 (located on the server with 3 ZooKeeper servers) it can be seen they all entered the state 'FOLLOWING' and from this step we would expect the Leader to start leading, and followers to start following:
{panel}
Jun 27 14:49:32 [myid: 12]: INFO FOLLOWING
Jun 27 14:49:32 [myid: 12]: INFO Created server with tickTime 1500 minSessionTimeout 3000 maxSessionTimeout 30000 datadir
Jun 27 14:49:32 [myid: 12]: INFO FOLLOWING - LEADER ELECTION TOOK - 1217
{panel}
 

But, servers from our first system (10,11,12) are not able to connect to new Leader (20), and it seems they are trying to connect to old Leader (21) (assuming this is the case since they are all using port 4512 which corresponds to Server 21). This log can be seen on all servers where we have 3 ZooKeeper servers deployed (10,11,12):
{panel}
Jun 27 14:49:38 [myid: 12]: WARN Unexpected exception, tries=0, connecting to /192.168.1.116:4512
Jun 27 14:49:38 java.net.SocketTimeoutException: connect timed out
Jun 27 14:49:38 at java.net.PlainSocketImpl.socketConnect(Native Method)
Jun 27 14:49:38 at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
Jun 27 14:49:38 at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
Jun 27 14:49:38 at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
Jun 27 14:49:38 at java.net.SocksSocketImpl.connect(Unknown Source)
Jun 27 14:49:38 at java.net.Socket.connect(Unknown Source)
Jun 27 14:49:38 at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:231)
Jun 27 14:49:38 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
Jun 27 14:49:38 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
{panel}
*Is my understanding correct, or this port is not indicating they are trying to connect to the wrong server?*

 

Even if the Leader restart provoked the restart of all its followers, seems that in this step all other servers should start to connect to the new leader and form a quorum of followers.
Instead of that scenario, a few seconds later, timeout occurs while waiting for epoch for quorum (the followers never start following although they all received notifications, and they try to connect to old leader) and the 'new' Leader shuts down again:
{panel}
Jun 27 14:49:39 [myid: 20]: WARN Unexpected exception
*Jun 27 14:49:39 java.lang.InterruptedException: Timeout while waiting for epoch from quorum*
Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:896)
Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:389)
Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:950)
Jun 27 14:49:39 [myid: 20]: INFO Shutting down
Jun 27 14:49:39 [myid: 20]: INFO Shutdown called
Jun 27 14:49:39 java.lang.Exception: shutdown Leader! reason: Forcing shutdown
Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:956)
Jun 27 14:49:39 [myid: 20]: INFO exception while shutting down acceptor: java.net.SocketException: Socket closed
Jun 27 14:49:39 [myid: 20]: INFO LOOKING
Jun 27 14:49:39 [myid: 20]: INFO New election. My id = 20, proposed zxid=0x66000012c7
Jun 27 14:49:39 [myid: 20]: ERROR Unexpected exception causing shutdown
Jun 27 14:49:39 java.InterruptedException
Jun 27 14:49:39 at java.lang.Object.wait(Native Method)
Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:892)
Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
Jun 27 14:49:39 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader), 0x66000012c7 (n.zxid), 0x1a (n.round), LOOKING (n.state), 20 (n.sid), 0x66 (n.peerEpoch) LOOKING (my state)
Jun 27 14:49:39 [myid: 20]: WARN ******* GOODBYE /10.22.0.2:55268 ********
{panel}
 

After 3 unsuccessfull retries from servers 10,11,12, since the quorum can not be achieved, connection times out and followers started to shut down again, After they are up, another election is triggered and new LEADER is now located on the first node (Server that becomes a new leader is 12):
{panel}
Jun 27 14:50:07 [myid: 12]: ERROR Unexpected exception
Jun 27 14:50:07 java.net.SocketTimeoutException: connect timed out
Jun 27 14:50:07 at java.net.PlainSocketImpl.socketConnect(Native Method)
Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
Jun 27 14:50:07 at java.net.SocksSocketImpl.connect(Unknown Source)
Jun 27 14:50:07 at java.net.Socket.connect(Unknown Source)
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:231)
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
Jun 27 14:50:07 [myid: 12]: WARN Exception when following the leader
Jun 27 14:50:07 java.net.SocketTimeoutException: connect timed out
Jun 27 14:50:07 at java.net.PlainSocketImpl.socketConnect(Native Method)
Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
Jun 27 14:50:07 at java.net.SocksSocketImpl.connect(Unknown Source)
Jun 27 14:50:07 at java.net.Socket.connect(Unknown Source)
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:231)
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
Jun 27 14:50:07 [myid: 12]: INFO shutdown called
Jun 27 14:50:07 java.lang.Exception: shutdown Follower
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
Jun 27 14:50:07 [myid: 12]: INFO Shutting down
Jun 27 14:50:07 [myid: 12]: INFO LOOKING
Jun 27 14:50:07 [myid: 12]: INFO New election. My id = 12, proposed zxid=0x66000012c7
Jun 27 14:50:08 [myid: 12]: INFO LEADING
{panel}
 

After this, all other Zookeeper servers normally start to follow the new leader and everything starts to work just fine.

 

Could you please help me and answer the following questions:
- is it expected behavior that Leader shutdowns all other servers (followers) during after its own restart?
-> if this is expected, could you please explain in which situations we can expect this behavior and why?
- if there was a notification sent about the new leader (20) to all other servers, why they were still connecting to old leader?
- do you have any recommendations on how to 'fix' this behavior?

Any help will be highly appreciated.
Thanks in advance!

Kind regards,
Lara
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
28 weeks, 1 day ago 0|z055lc:
ZooKeeper ZOOKEEPER-3477

ZOOKEEPER-3170 Flaky test:CommitProcessorMetricsTest.testConcurrentRequestProcessingInCommitProcessor

Sub-task Resolved Minor Duplicate Unassigned maoling maoling 30/Jul/19 06:08   03/Aug/19 02:27 03/Aug/19 02:27     tests   0 1    

[https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/org.apache.zookeeper$zookeeper/1011/testReport/junit/org.apache.zookeeper.server.quorum/CommitProcessorMetricsTest/testConcurrentRequestProcessingInCommitProcessor/]
{code:java}
Error Message
expected:<3> but was:<2>
Stacktrace
java.lang.AssertionError: expected:<3> but was:<2>
at org.apache.zookeeper.server.quorum.CommitProcessorMetricsTest.testConcurrentRequestProcessingInCommitProcessor(CommitProcessorMetricsTest.java:391)

Standard Output
2019-07-30 08:02:13,023 [myid:] - INFO [main:ZKTestCase$1@60] - STARTING testConcurrentRequestProcessingInCommitProcessor
2019-07-30 08:02:13,023 [myid:] - INFO [main:CommitProcessorMetricsTest@52] - setup
2019-07-30 08:02:13,023 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - RUNNING TEST METHOD testConcurrentRequestProcessingInCommitProcessor
2019-07-30 08:02:13,024 [myid:] - INFO [main:CommitProcessor@496] - Configuring CommitProcessor with readBatchSize -1 commitBatchSize 1
2019-07-30 08:02:13,025 [myid:] - INFO [main:CommitProcessor@454] - Configuring CommitProcessor with 24 worker threads.
2019-07-30 08:02:13,075 [myid:] - INFO [main:CommitProcessorMetricsTest$TestCommitProcessor@109] - numWorkerThreads in Test is 3
2019-07-30 08:02:15,079 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@99] - TEST METHOD FAILED testConcurrentRequestProcessingInCommitProcessor
java.lang.AssertionError: expected:<3> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at
{code}
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 2 days ago 0|z055gg:
ZooKeeper ZOOKEEPER-3476

ZOOKEEPER-3467 Identify client request for quota control

Sub-task Open Major Unresolved Unassigned Mocheng Guo Mocheng Guo 29/Jul/19 19:17   05/Aug/19 20:25       server   0 2   In order to support quota, we need a way to identify clients. If security is enabled, we might be able to use secured identity inside client certificate. But a generalized client-id based approach would be better to cover scenario without security.

The proposal here is to utilize existing zookeeper auth protocol to accept client identity.
# The client id should be sent by client once connection is established.
# Sending client id is optional. Note that server needs to enable auth provider if client does send in client id auth request or request would be denied without auth provider on server side.
# client id is JSON withe client_id as mandatory field. Additional fields can be added like client contact information, client version...
# This client identity will be cached in server connection and attached to requests from the connection.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 3 days ago 0|z05528:
ZooKeeper ZOOKEEPER-3475

ZOOKEEPER-3431 Enable BookKeeper checkstyle configuration on zookeeper-server

Sub-task Resolved Major Fixed Zili Chen Zili Chen Zili Chen 29/Jul/19 11:23   23/Aug/19 19:23 17/Aug/19 11:13 3.6.0 3.6.0 build   0 2 0 86400   Enable BookKeeper checkstyle configuration on zookeeper-server 100% 100% 86400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
30 weeks, 5 days ago 0|z054nk:
ZooKeeper ZOOKEEPER-3474

ZOOKEEPER-3431 Enable BookKeeper checkstyle configuration on zookeeper-promethus

Sub-task Resolved Major Fixed Zili Chen Zili Chen Zili Chen 29/Jul/19 11:22   06/Aug/19 05:24 05/Aug/19 09:30 3.6.0 3.6.0 build   0 2 0 13800   Enable BookKeeper checkstyle configuration on zookeeper-promethus 100% 100% 13800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 3 days ago 0|z054nc:
ZooKeeper ZOOKEEPER-3473

Improving successful TLS handshake throughput with concurrent control

Improvement Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 28/Jul/19 14:51   23/Nov/19 11:08 23/Nov/19 11:08   3.6.0 server   0 1 0 15600   When there are lots of clients trying to re-establish sessions, there might be lots of half finished handshake timed out, and those failed ones keep reconnecting to another server and restarting the handshake from beginning again, which caused herd effect.
 
And the number of total ZK sessions could be supported within session timeout are impacted a lot after enabling TLS.
 
To improve the throughput, we added the TLS concurrent control to reduce the herd effect, and from out benchmark this doubled the sessions we could support within session timeout.
100% 100% 15600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
16 weeks, 5 days ago 0|z053t4:
ZooKeeper ZOOKEEPER-3472

Treat check request as a write request which needs to wait for the check txn commit from leader

Improvement Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 28/Jul/19 14:19   06/Aug/19 00:49 05/Aug/19 19:42 3.6.0, 3.5.5 3.6.0 server   0 2 0 1200   Check op is usually used as a sub op in multi, but from the ZooKeeper server implementation it can also called separately, the learner will forward this request to leader, and the leader will check the version with the given version in the request, and generate a txn (error) in the quorum.
 
This is kind of a heavier sync to make sure when the client check on a learner, it is syncing up to date with leader when the check request is being processed. The learner need to wait for this remote commit before reply to client in FinalRequestProcessor.
 
There is no explicit API exposed for check, so it seems not a problem, but it could leave an issue there if the check API is exposed in the future.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 2 days ago 0|z053so:
ZooKeeper ZOOKEEPER-3471

Potential lock unavailable due to dangling ephemeral nodes left during local session upgrading

Bug Resolved Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 28/Jul/19 13:50   16/Oct/19 18:12 09/Oct/19 10:49 3.6.0 3.6.0 server   0 3 0 6600   There is a race condition which might be triggered if the client create session, upgrading the session with ephemeral node, then immediately issued close session request before it's removed from local session tracker.
 
The close session request will be treated as a local session close request since it still exists in the local session tracker, which goes through the ZK pipeline and delete the session from both local and global session tracker. Since the session is not tracked anymore, it will leave the ephemeral nodes there.
 
 
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
22 weeks, 1 day ago 0|z053s8:
ZooKeeper ZOOKEEPER-3470

ZOOKEEPER-3170 Flaky test: LearnerMetricsTest.testLearnerMetricsTest()

Sub-task Resolved Major Fixed Mate Szalay-Beko Andor Molnar Andor Molnar 26/Jul/19 08:53   24/Sep/19 07:11 24/Sep/19 04:34 3.6.0 3.6.0 tests   0 5 0 13800   Hi team,

New test testLearnerMetricsTest() added by the following commit failed 2 times on master with the same error:

junit.framework.AssertionFailedError: expected:<10> but was:<9>
at org.apache.zookeeper.server.quorum.LearnerMetricsTest.testLearnerMetricsTest(LearnerMetricsTest.java:88)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:80)

[https://github.com/apache/zookeeper/pull/856]
Submitted by: jhuan31

Maybe we should just blame the Thread.sleep(200) in line:77 and replace it with some clever logic.
Please take a look.

Regards,
Andor
100% 100% 13800 0 flaky, flaky-test, newbie, pull-request-available, test 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 2 days ago 0|z052dk:
ZooKeeper ZOOKEEPER-3469

Checkstyle rules for ensure proper javadocs

Improvement Open Major Unresolved Unassigned Zili Chen Zili Chen 25/Jul/19 20:22   09/Aug/19 22:15       build   0 1   ZOOKEEPER-3528 While introducing BookKeeper checkstyle rules(see ZOOKEEPER-3431), we find it worth to have a rule about javadocs of public classes. Specifically, check the javadocs of public class not empty but valid.

However, providing valid javadocs for public classes those do not have one at the moment is not a trivial work. Thus we file this issue to track another pass for doing so.
100% 6600 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
34 weeks ago 0|z051j4:
ZooKeeper ZOOKEEPER-3468

ZOOKEEPER-3431 Enable BookKeeper checkstyle configuration on zookeeper-jute

Sub-task Resolved Major Fixed Zili Chen Zili Chen Zili Chen 24/Jul/19 13:10   29/Jul/19 15:36 29/Jul/19 11:17     build   0 2 0 12600   Enable BookKeeper checkstyle configuration on zookeeper-jute 100% 100% 12600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 3 days ago 0|z04zo8:
ZooKeeper ZOOKEEPER-3467

Complete quota system for zookeeper server

New Feature Open Major Unresolved Unassigned Mocheng Guo Mocheng Guo 23/Jul/19 13:56   23/Jul/19 13:56   3.6.0   server   0 2   ZOOKEEPER-3476 Below are the areas to cover for a complete quota system.
 
1. client identification
2. quota configuration - metric key and value, format, storage
3. metrics collection and export - storage/rate/watch/connection
4. throttling implementation based on metrics inside server/client
 
Related JIRAs:
ZOOKEEPER-451
ZOOKEEPER-1383
ZOOKEEPER-2593
ZOOKEEPER-3301
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
34 weeks, 2 days ago 0|z04y88:
ZooKeeper ZOOKEEPER-3466

ZK cluster converges, but does not properly handle client connections (new in 3.5.5)

Bug Open Major Unresolved Unassigned Jan-Philip Gehrcke Jan-Philip Gehrcke 23/Jul/19 12:48   02/Aug/19 19:04   3.5.5       0 2   Linux Hey, we explore switching from ZooKeeper 3.4.14 to ZooKeeper 3.5.5 in [https://github.com/dcos/dcos].

DC/OS coordinates ZooKeeper via Exhibitor. We are not changing anything w.r.t. Exhibitor for now, and are hoping that we can use ZooKeeper 3.5.5 as a drop-in replacement for 3.4.14. This seems to work fine when Exhibitor uses a so-called static ensemble where the individual ZooKeeper instances are known a priori.

When Exhibitor however discovers individual ZooKeeper instances ("dynamic" back-end) then I think we observe a regression where ZooKeeper 3.5.5 can get into the following bad state (often, but not always):
# three ZooKeeper instances find each other, leader election takes place (*expected*)
# leader election succeeds: two followers, one leader (*expected*)
# all three ZK instances respond IAMOK to RUOK  (*expected*)
# all three ZK instances respond to SRVR (one says "Mode: leader", the other two say "Mode: follower")  (*expected*)
# all three ZK instances respond to MNTR and show plausible output (*expected*)
# *{color:#ff0000}Unexpected:{color}* any ZooKeeper client trying to connect to any of the three nodes observes a "connection timeout", whereas notably this is *not* a TCP connect() timeout. The TCP connect() succeeds, but then ZK does not seem to send the expected byte sequence to the TCP connection, and the ZK client waits for it via recv() until it hits a timeout condition. Examples for two different clients:
## In Kazoo we specifically hit _Connection time-out: socket time-out during read_
generated here: [https://github.com/python-zk/kazoo/blob/88b657a0977161f3815657878ba48f82a97a3846/kazoo/protocol/connection.py#L249]
## In zkCli we see  _Client session timed out, have not heard from server in 15003ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn:main-SendThread(localhost:2181))_
# This state is stable, it will last forever (well, at least for multiple hours and we didn't test longer than that).
# In our system the ZooKeeper clients are crash-looping. They retry. What I have observed is that while they retry the ZK ensemble accumulates outstanding requests, here shown from MNTR output (emphasis mine): 
zk_packets_received 2008
zk_packets_sent 127
zk_num_alive_connections 18
zk_outstanding_requests *1880*
# The leader emits log lines confirming session timeout, example:
_[myid:3] INFO [SessionTracker:ZooKeeperServer@398] - Expiring session 0x2000642b18f0020, timeout of 10000ms exceeded [myid:3] INFO [SessionTracker:QuorumZooKeeperServer@157] - Submitting global closeSession request for session 0x2000642b18f0020_
# In this state, restarting any one of the two ZK followers results in the same state (clients don't get data from ZK upon connect).
# In this state, restarting the ZK leader, and therefore triggering a leader re-election, almost immediately results in all clients being able to connect to all ZK instances successfully.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 6 days ago 0|z04y4o:
ZooKeeper ZOOKEEPER-3465

ZOOKEEPER-3431 Introduce BookKeeper checkstyle configuration

Sub-task Resolved Major Fixed Zili Chen Zili Chen Zili Chen 17/Jul/19 05:10   29/Jul/19 11:21 29/Jul/19 11:21   3.6.0 build   0 2 0 3600   For enable BookKeeper checkstyle configuration, we introduce a silent configuration.

Also

1. remove FileContentsHolder rule for checkstyle version compatibility.
2. check all rules in the simple checkstyle configuration included.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 3 days ago 0|z04r3s:
ZooKeeper ZOOKEEPER-3464

ZOOKEEPER-3431 enfore checkstyle in the zookeeper-server module and clean the package:admin and client

Sub-task Resolved Major Duplicate maoling maoling maoling 17/Jul/19 04:00   19/Dec/19 18:01 23/Aug/19 19:23     build   0 2 0 7800   100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 1 day ago 0|z04qz4:
ZooKeeper ZOOKEEPER-3463

Enable warning messages in maven compiler plugin

Task Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 16/Jul/19 09:57   16/Oct/19 14:59 24/Jul/19 09:44 3.6.0, 3.5.5, 3.4.14 3.5.6 build   0 2 0 10800   "Show Warnings" is off by default in Maven Compiler Plugin. This invalidates our most recent setting of -Werror (treat warnings as errors).

Let's enable compiler warning messages in all projects and adjust Xdoclint setting:
{noformat}
<configuration>
  <showWarnings>true</showWarnings>
  <compilerArgs>
    ...
    <compilerArg>-Xdoclint:-missing</compilerArg>
    ...
 </compilerArgs>
</configuration>{noformat}
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
34 weeks, 1 day ago 0|z04pxc:
ZooKeeper ZOOKEEPER-3462

Drop Java 9 support

Task Resolved Major Fixed Andor Molnar Andor Molnar Andor Molnar 12/Jul/19 11:05   08/Oct/19 15:22 08/Oct/19 15:22 3.6.0, 3.5.5, 3.4.14   documentation, server   0 3 0 3000   Java 9 is EOL already. We should drop supporting it in our builds.
- Delete Jenkins jobs,
- Modify docs and highlight which versions of JDK are we actively testing,
- Java 9 related issues are no blockers for releases.

This ticket is based on the following vote: http://markmail.org/thread/t6cefp3qjbvbrzhg
100% 100% 3000 0 java9, jdk9, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 2 days ago 0|z04mcw:
ZooKeeper ZOOKEEPER-3461

add a doc about the admin server command

Improvement Resolved Major Duplicate maoling maoling maoling 12/Jul/19 03:33   22/Aug/19 06:28 22/Aug/19 06:28 3.6.0   documentation   0 2   {code:java}
<a name="sc_adminserver"></a>

#### The AdminServer

**New in 3.5.0:** The AdminServer is
an embedded Jetty server that provides an HTTP interface to the four
letter word commands. By default, the server is started on port 8080,
and commands are issued by going to the URL "/commands/\[command name]",
e.g., http://localhost:8080/commands/stat. The command response is
returned as JSON. Unlike the original protocol, commands are not
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 6 days ago 0|z04lq0:
ZooKeeper ZOOKEEPER-3460

Zookeeper 3.4.13: keeps crashing after a repave in cloudnative environment.

Bug Resolved Major Cannot Reproduce Unassigned Chandrasekhar Chandrasekhar 10/Jul/19 12:35   24/Jul/19 10:43 24/Jul/19 10:43 3.4.13   other   0 2   Kubernetes Cloud Native Environment. We have used the minimal binary installation for Zookeeper and every time after repave the zookeeper keeps crashing with following logs...

I have attached the zookeeper crash logs and deployment information. Is this related to one of the NULL Pointer Issues mentioned in https://issues.apache.org/jira/browse/ZOOKEEPER-3009 ?

We are trying to find the exact issue here so our cloud native platform guys can help us further. Kindly let us know how to turn on debugging further.
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
34 weeks, 1 day ago

0|z04je0:
ZooKeeper ZOOKEEPER-3459

Add admin command to display synced state of peer

Improvement Resolved Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 09/Jul/19 15:54   15/Jul/19 19:28 15/Jul/19 08:15 3.6.0 3.6.0 server   0 2 0 3000   Add another command to the admin server that will respond with the current phase of the Zab protocol that a given peer is running. This will help with understanding what is going on in an ensemble while it is settling after a leader election and with programmatically checking for a healthy "broadcast" state.


 
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 3 days ago 0|z04i88:
ZooKeeper ZOOKEEPER-3458

ZK 3.5.5 : Dynamic SecureClientPort and Server Specs

Improvement Resolved Major Duplicate Unassigned Fredrick Eisele Fredrick Eisele 08/Jul/19 11:51   10/Jul/19 06:59 10/Jul/19 06:59 3.5.5   java client   0 2   ZK 3.5.5 : Dynamic configuration of SecureClientPort and Server Specs

The server specification is ...

{{server.<positive id> = <address1>:<port1>:<port2>[:role];[<client port address>:]<client port>}}

 
The clientPort and clientPortAddress are accomodated but I do not see a provision for secureClientPort.
 
secureClientPort and secureClientPortAddress
were not made part of the dynamic configuration introduced in ZK 3.5.5
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
36 weeks, 3 days ago Yes, this is the same as ZOOKEEPER-3166 0|z04gqo:
ZooKeeper ZOOKEEPER-3457

Code optimization in QuorumCnxManager

Improvement Resolved Trivial Fixed tom.long tom.long tom.long 08/Jul/19 02:28   27/Jul/19 19:12 27/Jul/19 14:05 3.5.5 3.6.0 quorum   0 3 3600 3600 0%
Dear developer:
I think the following code in line 623 of the QuorumCnxManager class can be optimized:

{code:java}
ArrayBlockingQueue<ByteBuffer> bq = new ArrayBlockingQueue<ByteBuffer>(
SEND_CAPACITY);
ArrayBlockingQueue<ByteBuffer> oldq = queueSendMap.putIfAbsent(sid, bq);
if (oldq != null) {
addToSendQueue(oldq, b);
} else {
addToSendQueue(bq, b);
}
{code}
The optimization is as follows:
{code:java}
ArrayBlockingQueue<ByteBuffer> bq = queueSendMap.computeIfAbsent(sid, serverId
-> new ArrayBlockingQueue<>(SEND_CAPACITY));
addToSendQueue(bq, b);
{code}
0% 0% 3600 3600 easyfix, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 5 days ago https://github.com/apache/zookeeper/pull/1021 0|z04fwo:
ZooKeeper ZOOKEEPER-3456

Service temporarily unavailable due to an ongoing leader election. Please refresh

Bug Open Major Unresolved Unassigned Marzieh Marzieh 07/Jul/19 07:55   17/Jul/19 01:32     3.4.14 server   0 2   docker container with Ubuntu 16.04 Hi

I configured Zookeeper with four nodes for my Mesos cluster with Marathon. When I ran Flink Json file on Marathon, it was run without problem. But, when I entered IP of my two slaves, just one slave shew Flink UI and another slave shew this error:

 

Service temporarily unavailable due to an ongoing leader election. Please refresh

I checked "zookeeper.out" file and it said that :

 

019-07-07 11:48:43,412 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg
2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - Resolved hostname: 0.0.0.0 to address: /0.0.0.0
2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - Resolved hostname: 10.32.0.3 to address: /10.32.0.3
2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - Resolved hostname: 10.32.0.2 to address: /10.32.0.2
2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - Resolved hostname: 10.32.0.5 to address: /10.32.0.5
2019-07-07 11:48:43,422 [myid:] - WARN [main:QuorumPeerConfig@354] - Non-optimial configuration, consider an odd number of servers.
2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeerConfig@398] - Defaulting to majority quorums
2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2019-07-07 11:48:43,432 [myid:3] - INFO [main:QuorumPeerMain@130] - Starting quorum peer
2019-07-07 11:48:43,437 [myid:3] - INFO [main:ServerCnxnFactory@117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connect$
2019-07-07 11:48:43,439 [myid:3] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-07-07 11:48:43,440 [myid:3] - ERROR [main:QuorumPeerMain@92] - Unexpected exception, exiting abnormally
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:133)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)

 

I searched a lot and could not find the solution.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 1 day ago 0|z04fi8:
ZooKeeper ZOOKEEPER-3455

Java 13 build failure on trunk: UnifiedServerSocketTest.testConnectWithoutSSLToStrictServer

Test Closed Major Fixed Mate Szalay-Beko Andor Molnar Andor Molnar 02/Jul/19 08:54   16/Oct/19 14:59 02/Aug/19 07:55 3.6.0 3.6.0, 3.5.6 tests   0 2 0 6000   The following tests are constantly failing on Java 13 trunk builds:

org.apache.zookeeper.server.quorum.UnifiedServerSocketTest.testConnectWithoutSSLToStrictServer

[https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-java13/155/]
100% 100% 6000 0 Java13, jdk13, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 6 days ago 0|z04ark:
ZooKeeper ZOOKEEPER-3454

add a new CLI: quorumInfo

New Feature Resolved Major Invalid maoling maoling maoling 02/Jul/19 05:54   14/Oct/19 06:17 14/Oct/19 06:17     java client   0 1   ./etcdctl -w table --endpoints=127.0.0.1:2379,127.0.0.1:3379,127.0.0.1:4379 endpoint status
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 2d6d41b3de34869d | 3.3.11 | 20 kB | true | 46 | 18 |
| 127.0.0.1:3379 | fa452fb497312bf1 | 3.3.11 | 20 kB | false | 46 | 18 |
| 127.0.0.1:4379 | a20a53b92b8e7e56 | 3.3.11 | 20 kB | false | 46 | 18 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
[用name会报错]:Error: dial tcp: lookup etcd-1 on 10.9.255.1:53: no such host
./etcdctl -w table --endpoints=etcd-1:2379,etcd-2:3379,etcd-3:4379 endpoint status


./etcdctl endpoint health
./etcdctl --endpoints=127.0.0.1:2379,127.0.0.1:3379,127.0.0.1:4379 health
127.0.0.1:2379 is healthy: successfully committed proposal: took = 1.454841ms

./etcdctl member list
3fcd54d1766f30b4: name=etcd-2 peerURLs=http://127.0.0.1:2381 clientURLs=http://127.0.0.1:3379,http://127.0.0.1:3379 isLeader=false
48e24310bdb358ce: name=etcd-1 peerURLs=http://127.0.0.1:2380 clientURLs=http://127.0.0.1:2379,http://127.0.0.1:2379 isLeader=true
66846ef509b1c4d7: name=etcd-3 peerURLs=http://127.0.0.1:2382 clientURLs=http://127.0.0.1:4379,http://127.0.0.1:4379 isLeader=false
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 2 days ago 0|z04ajk:
ZooKeeper ZOOKEEPER-3453

missing 'SET' in zkCli on windows

Improvement Closed Minor Fixed Unassigned Jorg Heymans Jorg Heymans 02/Jul/19 03:55   14/Feb/20 10:23 12/Jul/19 10:54   3.6.0, 3.5.7     0 2 0 12600   this is printed during startup because it's missing the {{set}}

{{'ZOO_LOG_FILE' is not recognized as an internal or external command, operable program or batch file.}}
100% 100% 12600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 6 days ago 0|z04acw:
ZooKeeper ZOOKEEPER-3452

ZOOKEEPER-3451 Document ZOOKEEPER-3174 Quorum TLS - support reloading trust/key store

Sub-task Open Major Unresolved Unassigned Andor Molnar Andor Molnar 01/Jul/19 10:56   01/Jul/19 10:56   3.5.5   documentation   0 1   Create documentation for trust/key store reloading feature. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 3 days ago 0|z049h4:
ZooKeeper ZOOKEEPER-3451

Quorum TLS (umbrella)

Task Open Major Unresolved Andor Molnar Andor Molnar Andor Molnar 01/Jul/19 10:49   01/Jul/19 10:54   3.5.5   server   0 1   ZOOKEEPER-236, ZOOKEEPER-2750, ZOOKEEPER-3172, ZOOKEEPER-3173, ZOOKEEPER-3174, ZOOKEEPER-3175, ZOOKEEPER-3176, ZOOKEEPER-3194, ZOOKEEPER-3229, ZOOKEEPER-3375, ZOOKEEPER-3384, ZOOKEEPER-3443, ZOOKEEPER-3452 Umbrella ticket for Quorum TLS related issues. 100% 295200 0 ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 3 days ago 0|z049gg:
ZooKeeper ZOOKEEPER-3450

ZOOKEEPER-3449 Allow Quorum Leader to Register with Another Quorum

Sub-task Open Major Unresolved Unassigned David Mollitor David Mollitor 01/Jul/19 09:59   01/Jul/19 10:00       server   0 1   When a ZK instance within a quorum is elected leader, add a feature that registers all the servers within the quorum with another ZK quorum.

The idea is ultimately that all quorums within an environment register with a single global quorum so that clients can discover the service-specific quorums.

{code:none}
/zk/register/<service>/quorum{host1,host2,host3}

/zk/register/hbase/quorum{host1,host2,host3}
/zk/register/hdfs/quorum{host1,host2,host3}
/zk/register/yarn/quorum{host1,host2,host3}
{code}

Note that a single quorum should be able to register (be responsible for) several services.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 3 days ago 0|z049d4:
ZooKeeper ZOOKEEPER-3449

Zero Configuration Discovery of ZooKeeper Quorum

New Feature Open Major Unresolved Unassigned David Mollitor David Mollitor 01/Jul/19 09:53   01/Jul/19 09:53       server   0 1   ZOOKEEPER-3450 ZooKeeper is often used for [service discovery|https://curator.apache.org/curator-x-discovery/index.html]. However, how are clients to discover the ZooKeeper quorum itself?

 Provide a mechanism for clients to automatically discover a ZooKeeper quorum.

[https://en.wikipedia.org/wiki/Zero-configuration_networking]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 3 days ago 0|z049cw:
ZooKeeper ZOOKEEPER-3448

Introduce MessageTracker to assist debug leader and leaner connectivity issues

Improvement Resolved Major Fixed Michael Han Michael Han Michael Han 28/Jun/19 18:19   24/Aug/19 01:23 23/Aug/19 13:43 3.6.0 3.6.0 server   0 2 0 18000   We want to have better insight on the state of the world when learners lost connection with leader, so we need capture more information when that happens. We capture more information through MessageTracker which will record the last few sent and received messages at various protocol stage, and these information will be dumped to log files for further analysis. 100% 100% 18000 0 Twitter, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 5 days ago 0|z047pc:
ZooKeeper ZOOKEEPER-3447

add a doc: zookeeperMonitor.md

New Feature Resolved Major Fixed maoling maoling maoling 28/Jun/19 07:03   02/Aug/19 11:46 02/Aug/19 08:03   3.6.0 documentation   0 2 0 3600   100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 6 days ago 0|z0473s:
ZooKeeper ZOOKEEPER-3446

Enable UnusedImports and RedundantImport checkstyle rules

Improvement Resolved Major Won't Fix Zili Chen Zili Chen Zili Chen 27/Jun/19 05:57   11/Sep/19 16:31 16/Jul/19 06:30 3.5.6   build   0 1 0 3600   Propose enabling {{UnusedImports}} and {{RedundantImport}} checkstyle rules in all modules.

cc [~andorm] [~eolivelli] for decision.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 2 days ago 0|z045ko:
ZooKeeper ZOOKEEPER-3445

Concurrency issue in ReferenceCountedACLCache

Bug Resolved Critical Invalid Unassigned Jonathan Halterman Jonathan Halterman 25/Jun/19 16:29   28/Jun/19 11:18 28/Jun/19 11:16 3.5.5, 3.4.14   server   0 2 0 9600   While debugging some unexpected "ACL not available for long" exceptions we were seeing, I noticed that ReferenceCountedACLCache does not mark aclIndex as volatile, which it should since it appears to be read from multiple threads. This may or may not be the cause of the behavior we're seeing, but should be fixed regardless. 100% 100% 9600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 6 days ago 0|z043m0:
ZooKeeper ZOOKEEPER-3444

Latest stable version will not start on Windows Server 2016 (DataCenter edition) and Java 8

Bug Open Major Unresolved Unassigned Bobi Traykov Bobi Traykov 25/Jun/19 04:49   26/Jun/19 06:40   3.5.5   java client   0 2   Windows Server 2016 DataCenter edition
Firewall and User Access Control - both full disabled.

+Java used+:
java version "1.8.0_212"
Java(TM) SE Runtime Environment (build 1.8.0_212-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixed mode)


+Java package (from the Oracle site)+: jre-8u212-windows-x64.exe


Used the bin package of the ZooKeeper release.
When attepmting to start ZooKeeper using the *ZKServer.cmd* batch file, I receive the following error:
{noformat}
D:\apache-zookeeper-3.5.5-bin\bin>call "D:\jre1.8.0_212"\bin\java "-Dzookeeper.log.dir=D:\apache-zookeeper-3.5.5-bin\bin\..\logs" "-Dzookeeper.root.logger=ALL1,CONSOLE" "-Dzookeeper.log.file=zookeeper-ironman-server-TF-AMIR-BASTION.log" "-XX:+HeapDumpOnOutOfMemoryError" "-XX:OnOutOfMemoryError=cmd /c taskkill /pid %%p /t /f" -cp "D:\apache-zookeeper-3.5.5-bin\bin\..\build\classes;D:\apache-zookeeper-3.5.5-bin\bin\..\build\lib\*;D:\apache-zookeeper-3.5.5-bin\bin\..\*;D:\apache-zookeeper-3.5.5-bin\bin\..\lib\*;D:\apache-zookeeper-3.5.5-bin\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "D:\apache-zookeeper-3.5.5-bin\bin\..\conf\zoo.cfg"
2019-06-25 08:40:41,391 [myid:] - INFO [main:QuorumPeerConfig@133] - Reading configuration from: D:\apache-zookeeper-3.5.5-bin\bin\..\conf\zoo.cfg
2019-06-25 08:40:41,391 [myid:] - ERROR [main:QuorumPeerMain@89] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing D:\apache-zookeeper-3.5.5-bin\bin\..\conf\zoo.cfg
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:154)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:113)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
Caused by: java.lang.IllegalArgumentException: dataDir is not set
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:368)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:150)
... 2 more
Invalid config, exiting abnormally{noformat}
 

 

The *zoo.cfg* is a copy-paste from the *zoo_sample.cfg* - the *dataDir* parameter is pointing to E:/zookeeper - an existing empty directory.

The previous two stable releases start without any issues:

zookeeper-3.4.10.tar.gz
zookeeper-3.4.14.tar.gz

 

Last, but not least, the 3.5.5 release works fine for me on Windows 10 (fully updated), Windows Server 2008 R2 and Windows Server 2012 R2.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 1 day ago 0|z042m8:
ZooKeeper ZOOKEEPER-3443

ZOOKEEPER-3451 Add support for PKCS12 trust/key stores

Sub-task Closed Major Fixed Ivan Yurchenko Ivan Yurchenko Ivan Yurchenko 25/Jun/19 02:46   16/Oct/19 14:58 15/Jul/19 08:49   3.6.0, 3.5.6 server   0 2 0 10200   Let's add PKCS12 support for trust and key stores in client and quorum TLS. 100% 100% 10200 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 3 days ago 0|z042i0:
ZooKeeper ZOOKEEPER-3442

OWASP jenkins failing due to jackson databind CVE published

Bug Resolved Blocker Duplicate Unassigned Patrick D. Hunt Patrick D. Hunt 24/Jun/19 13:32   11/Sep/19 16:33 24/Jun/19 14:52 3.6.0, 3.5.5, 3.4.14       0 1   The OWASP job is failing due to a medium priority jackson databind issue.

http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2019-12814

we should upgrade the dependency version - I looked into the issue, should be straightforward, however the new dependency (2.9.9.1) is not yet available from the upstream. Once it is we should upgrade.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 3 days ago 0|z041y0:
ZooKeeper ZOOKEEPER-3441

OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814

Task Closed Critical Fixed Patrick D. Hunt Enrico Olivelli Enrico Olivelli 23/Jun/19 16:21   16/Oct/19 14:59 12/Jul/19 01:33 3.6.0 3.6.0, 3.4.15, 3.5.6 build, security   0 2 0 10800   OWASP dependency checker is flagging jackson-databind-2.9.9.jar for CVE-2019-12814 (https://nvd.nist.gov/vuln/detail/CVE-2019-12814)
We should upgrade the library but we are currently using the latest and greatest 2.9.9.


{noformat}
A Polymorphic Typing issue was discovered in FasterXML jackson-databind 2.x through 2.9.9. When Default Typing is enabled (either globally or for a specific property) for an externally exposed JSON endpoint and the service has JDOM 1.x or 2.x jar in the classpath, an attacker can send a specifically crafted JSON message that allows them to read arbitrary local files on the server.
{noformat}

We don't have jdom on the classpath, so we are not affected directly by this change, but users that are using ZooKeeper Server in a custom environment should take note of this issue

this is the issue on Jackson: https://github.com/FasterXML/jackson-databind/issues/2341




100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 6 days ago 0|z040t4:
ZooKeeper ZOOKEEPER-3440

Fix Apache RAT check by excluding binary files (images)

Bug Closed Critical Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 22/Jun/19 18:37   16/Oct/19 14:58 28/Jun/19 18:38 3.6.0 3.6.0, 3.5.6 build, documentation   0 3 0 7200   I see this error on Jenkins as we are missing the exclusion for the images of the docs.

{code:java}
Unapproved licenses:

/home/jenkins/jenkins-slave/workspace/zookeeper-master-maven/zookeeper-docs/src/main/resources/markdown/images/state_dia.dia
{code}

We should also add this check to the precommit job on Travis (this will be part of the commit) and on CI (this is a manual configuration, to be done after fixing this issue)
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 1 day ago 0|z040gw:
ZooKeeper ZOOKEEPER-3439

Observability improvements on client / server connection close

Improvement Resolved Major Fixed Michael Han Michael Han Michael Han 21/Jun/19 20:18   05/Jul/19 18:33 02/Jul/19 18:45 3.6.0 3.6.0 server   0 2 0 4800   Currently when server close a client connection there is not enough information recorded (except few exception logs) which makes it hard to do postmortems. On the other side, having a complete view of the aggregated connection closing reason will provide more signals based on which we can better operate the clusters (e.g. predicate an incident might happen based on the trending of the connection closing reasons). 100% 100% 4800 0 Twitter, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 1 day ago 0|z03zyo:
ZooKeeper ZOOKEEPER-3438

Flaky test:org.apache.zookeeper.server.PrepRequestProcessorMetricsTest.testPrepRequestProcessorMetrics

Bug Resolved Minor Not A Problem Unassigned maoling maoling 21/Jun/19 05:46   19/Dec/19 17:59 22/Jun/19 06:57     tests   0 1   [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/org.apache.zookeeper$zookeeper/844/testReport/junit/org.apache.zookeeper.server/PrepRequestProcessorMetricsTest/testPrepRequestProcessorMetrics/]
{code:java}
Error Message
expected:<5> but was:<4>
Stacktrace
java.lang.AssertionError: expected:<5> but was:<4>
at org.apache.zookeeper.server.PrepRequestProcessorMetricsTest.testPrepRequestProcessorMetrics(PrepRequestProcessorMetricsTest.java:146)

Standard Output
2019-06-21 09:09:37,915 [myid:] - INFO [main:ZKTestCase$1@60] - STARTING testPrepRequestProcessorMetrics
2019-06-21 09:09:37,917 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - RUNNING TEST METHOD testPrepRequestProcessorMetrics
2019-06-21 09:09:37,951 [myid:] - ERROR [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@1002] - Failed to process sessionid:0x1 type:setData cxid:0x0 zxid:0x1 txntype:5 reqpath:n/a
java.lang.NullPointerException
at org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:521)
at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:872)
at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:156)
2019-06-21 09:09:37,952 [myid:] - ERROR [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@1015] - Dumping request buffer: 0x00042f666f6f0000ffffffff
2019-06-21 09:09:37,959 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@99] - TEST METHOD FAILED testPrepRequestProcessorMetrics
java.lang.AssertionError: expected:<5> but was:<4>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at org.apache.zookeeper.server.PrepRequestProcessorMetricsTest.testPrepRequestProcessorMetrics(PrepRequestProcessorMetricsTest.java:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:80)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
2019-06-21 09:09:37,960 [myid:] - INFO [main:ZKTestCase$1@75] - FAILED testPrepRequestProcessorMetrics
java.lang.AssertionError: expected:<5> but was:<4>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at org.apache.zookeeper.server.PrepRequestProcessorMetricsTest.testPrepRequestProcessorMetrics(PrepRequestProcessorMetricsTest.java:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:80)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
2019-06-21 09:09:37,960 [myid:] - INFO [main:ZKTestCase$1@65] - FINISHED testPrepRequestProcessorMetrics
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 6 days ago 0|z03yxk:
ZooKeeper ZOOKEEPER-3437

Improve sync throttling on a learner master

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 20/Jun/19 15:19   25/Jul/19 23:51 25/Jul/19 15:43 3.6.0 3.6.0 quorum   0 2 0 13200   As described in ZOOKEEPER-1928, a leader can become overloaded if it sends too many snapshots concurrently during sync time.  Sending too many diffs at the same time can also cause the overloading issue. 

In this JIRA, we will:
# add diff sync throttling in addition to snap sync throttling
# extend the protection to followers that serve observers
# improve the counting of concurrent snap syncs/diff syncs to avoid double counting or missing counting
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 6 days ago 0|z03yao:
ZooKeeper ZOOKEEPER-3436

Enhance Mavenized Make C client

Improvement Resolved Critical Fixed Mate Szalay-Beko Enrico Olivelli Enrico Olivelli 20/Jun/19 14:15   10/Oct/19 13:00 10/Oct/19 08:43 3.6.0, 3.5.5 3.6.0 c client   0 4 0 22200   we want to be able to build the c-client with maven using these commands:

Jump to the directory
{code}
cd zookeeper-client/zookeeper-c-client
{code}

Build without running tests
{code}
mvn clean install -DskipTests
{code}

Build and run tests
{code}
mvn clean install
{code}

from the root directory we will have:
{code}
mvn clean install -Pfull-build -DskipTests
{code}
and (with tests)
{code}
mvn clean install -Pfull-build {code}
{code}
100% 100% 22200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
23 weeks ago 0|z03y8g:
ZooKeeper ZOOKEEPER-3435

client session expired after timeout after errors and warning logs in zk server logs

Bug Open Major Unresolved Unassigned prashant prashant 20/Jun/19 10:43   20/Jun/19 10:43   3.5.1       0 1   Hi

We use 3.5.1-alpha version.

We are seeing session expiry issue in VM set up.

This is running in replicated more (two servers + node mastership as one vote for quorum).

we see client session expired after session timeout (of 10 sec).

This connection was to local zk server. session timeout is 10 sec.

This session got established at 17:40:18 and ZK server expired this at 17:40:57, after 39 seconds of establishment.

in between this time, i see few errors and warnings in zookeeper server logs (as shown below).

I see below errors/warning in between this time before session expiry.

This issue is not very easy to replicate , so far we have seen only twice.

Could you please help me identify root cause and let me know if this is fixed in later release ? Thanks, Prashant

Logs are in below mail:

 

[https://mail-archives.apache.org/mod_mbox/zookeeper-user/201906.mbox/browser]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks ago 0|z03xyo:
ZooKeeper ZOOKEEPER-3434

[FileTabCharacter] clear up the all the checkstyle violations in the zookeeper-server module

Improvement Resolved Minor Won't Fix maoling maoling maoling 19/Jun/19 22:57   25/Jul/19 20:30 25/Jul/19 20:30     build   0 1 0 2400   [FileTabCharacter] clear up the all the checkstyle violations in the zookeeper-server module 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks ago 0|z03x9k:
ZooKeeper ZOOKEEPER-3433

zkpython build broken after maven migration

Bug Closed Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 18/Jun/19 21:07   16/Oct/19 14:59 25/Jun/19 09:51 3.6.0, 3.5.5, 3.4.14 3.6.0, 3.5.6 contrib-bindings   0 1 0 6000   zkpython is not building after the migration to maven directory structure. 100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 2 days ago 0|z03vvs:
ZooKeeper ZOOKEEPER-3432

Improving zookeeper trace for performance and scalability

Improvement Open Major Unresolved Unassigned Mocheng Guo Mocheng Guo 18/Jun/19 16:36   22/Jan/20 08:20   3.5.6   server   0 1 0 9000   Current server trace goes into normal local log files which does not scale and has negative impact on server performance when turned on. The proposed improvement is to write traces asynchronously with configurable in memory buffer to a separate process which can be on different hardware, so that large volume of traces would be processed and persisted without affecting zookeeper server performance. 100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks, 2 days ago 0|z03vls:
ZooKeeper ZOOKEEPER-3431

Enable BookKeeper checkstyle configuration

Task Resolved Major Fixed Zili Chen maoling maoling 18/Jun/19 05:46   25/Sep/19 11:58 27/Aug/19 04:21   3.6.0 build   0 3   ZOOKEEPER-3464, ZOOKEEPER-3465, ZOOKEEPER-3468, ZOOKEEPER-3474, ZOOKEEPER-3475, ZOOKEEPER-3517 As discussed in [mailing list|https://lists.apache.org/thread.html/245557316cbe91e6e189b215eff93c65aac5b5aae355dfe461a84c7b@%3Cdev.zookeeper.apache.org%3E], our community decide to enable a more meaningful checkstyle configuration, specifically, BookKeeper's checkstyle configuration.

Break down implement steps

1. Introduce BookKeeper's checkstyle configuration.
2. Enable this checkstyle configuration per package, for the first iteration, we want to enable it in
1). zookeeper-server
2). zookeeper-jute
3). zookeeper-prometheus-metrics

UPDATE 2019-08-27:

x. Turn on checkstyle configuration in project level while suppress on zookeeper-contrib.
100% 129600 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 2 days ago 0|z03uu8:
ZooKeeper ZOOKEEPER-3430

Observability improvement: provide top N read / write path queries

Improvement Resolved Major Fixed Michael Han Michael Han Michael Han 17/Jun/19 17:19   01/Aug/19 17:04 01/Aug/19 01:28 3.6.0 3.6.0 server   0 2 0 13800   We would like to have a better understanding of the type of workloads hit ZK, and one aspect of such understanding is to be able to answer queries of top N read and top N write request path. Knowing the hot request paths will allow us better optimize for such workloads, for example, enabling path specific caching, or change the path structure (e.g. break a long path to hierarchical paths). 100% 100% 13800 0 Twitter, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks ago 0|z03u94:
ZooKeeper ZOOKEEPER-3429

ZOOKEEPER-3170 Flaky test test:org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset

Sub-task Open Major Unresolved Unassigned maoling maoling 13/Jun/19 04:50   14/Dec/19 06:08     3.7.0 tests   0 1 0 1800   [https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-java9/lastFailedBuild/testReport/junit/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset/]

 
{code:java}
Error Message
test timed out after 840000 milliseconds
Stacktrace
org.junit.runners.model.TestTimedOutException: test timed out after 840000 milliseconds
at java.base@9.0.1/java.lang.Object.wait(Native Method)
at java.base@9.0.1/java.lang.Object.wait(Object.java:516)
at app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1556)
at app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1539)
at app//org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1537)
at app//org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset(DisconnectedWatcherTest.java:247)
at java.base@9.0.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base@9.0.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base@9.0.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at app//org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:80)
at java.base@9.0.1/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base@9.0.1/java.lang.Thread.run(Thread.java:844)
{code}
 
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
36 weeks, 6 days ago 0|z03ozc:
ZooKeeper ZOOKEEPER-3428

enable the TTL node and add a lazy-delete strategy to get noNodeException quickly when the ttl node had expired

Improvement Open Major Unresolved maoling maoling maoling 12/Jun/19 06:32   14/Dec/19 06:06     3.7.0 server   0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
40 weeks, 1 day ago 0|z03nqo:
ZooKeeper ZOOKEEPER-3427

Introduce SnapshotComparer that assists debugging with snapshots.

Improvement Resolved Major Fixed Maya Wang Michael Han Michael Han 11/Jun/19 19:36   02/Mar/20 21:16 28/Feb/20 04:58 3.6.0 3.7.0 server   0 1 0 33000   SnapshotComparer is a tool that loads and compares two snapshots, with configurable threshold and various filters. It's useful in use cases that involve snapshot analysis, such as offline data consistency checking, and data trending analysis (e.g. what's growing under which zNode path during when). 

A sample output of the tool (actual numbers removed, due to sensitivity).
{code:java}
Successfully parsed options!

Deserialized snapshot in snapshot.0 in seconds

Processed data tree in seconds

Deserialized snapshot in snapshot.1 in seconds

Processed data tree in seconds

Node count:

Total size:

Max depth:

Count of nodes at depth 1:

Count of nodes at depth 2:

Count of nodes at depth 3:

Count of nodes at depth 4:

Count of nodes at depth 5:

Count of nodes at depth 6:

Count of nodes at depth 7:

Count of nodes at depth 8:

Count of nodes at depth 9:

Count of nodes at depth 10:

Count of nodes at depth 11:


Node count:

Total size:

Max depth:

Count of nodes at depth 1:

Count of nodes at depth 2:

Count of nodes at depth 3:

Count of nodes at depth 4:

Count of nodes at depth 5:

Count of nodes at depth 6:

Count of nodes at depth 7:

Count of nodes at depth 8:

Count of nodes at depth 9:

Count of nodes at depth 10:

Count of nodes at depth 11:




Analysis for depth 0

Analysis for depth 1

Analysis for depth 2

Analysis for depth 3

Analysis for depth 4

Analysis for depth 5

Analysis for depth 6

Analysis for depth 7

Analysis for depth 8

Analysis for depth 9

Analysis for depth 10
{code}
100% 100% 33000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 weeks, 6 days ago 0|z03n48:
ZooKeeper ZOOKEEPER-3426

ZK prime_connection(the Handshake) can complete without reading all the payload.

Bug Open Blocker Unresolved Unassigned Suhas Dantkale Suhas Dantkale 11/Jun/19 15:11   11/Jun/19 15:11       c client   0 1   /* returns:

* -1 if recv call failed,

* 0 if recv would block,

* 1 if success

*/

static int recv_buffer(zhandle_t *zh, buffer_list_t *buff)

{

int off = buff->curr_offset;

int rc = 0;

[................]

if (buff == &zh->primer_buffer && rc == buff->len - 1) ++rc; <====== Handshake prematurely complete.





On non-blocking socket, it's possible that socket has exactly "buff->len - 1" bytes to read.
Because of the above line, the Handshake is prematurely completed.
What this can lead to is:
There will be one outstanding byte left on the socket and it might go as part of next message which could get corrupted.

I think this can lead to ZRUNTIMEINCONSISTENCY issues later.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
40 weeks, 2 days ago 0|z03mso:
ZooKeeper ZOOKEEPER-3425

ranking the ttl by expireTime asc for the performance

Improvement Open Minor Unresolved maoling maoling maoling 11/Jun/19 06:49   14/Dec/19 06:08     3.7.0 server   0 2 0 21000   100% 100% 21000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 3 days ago 0|z03m3c:
ZooKeeper ZOOKEEPER-3424

download page file package broken

Bug Open Trivial Unresolved Unassigned Wei Xin Wei Xin 11/Jun/19 05:21   27/Aug/19 09:45   3.5.5   build   0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 2 days ago 0|z03lz4:
ZooKeeper ZOOKEEPER-3423

use the maven-like way to ignore the generated version java files and doc the cmd:'./zkServer.sh version'

Improvement Resolved Minor Fixed maoling maoling maoling 11/Jun/19 01:51   07/Jul/19 06:22 26/Jun/19 04:35   3.6.0 scripts   0 2 0 12000   100% 100% 12000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 1 day ago 0|z03lu0:
ZooKeeper ZOOKEEPER-3422

add a getDataList operation

Improvement Open Minor Unresolved Unassigned Peter Welch Peter Welch 11/Jun/19 01:33   11/Jun/19 04:41           0 1 0 1800   In a single request, support calling getData on a List of znodes.  Fetching batches of data could be more performant and make for simpler client code than making multiple getData requests.  The savings presumably would be in serde and in the server processing pipeline.  Useful for use cases such as cold start from a large dataset. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
40 weeks, 2 days ago 0|z03ltc:
ZooKeeper ZOOKEEPER-3421

Better insight into Observer connections

Wish Resolved Minor Fixed Unassigned Brian Nixon Brian Nixon 10/Jun/19 18:38   28/Jun/19 22:17 28/Jun/19 18:34 3.6.0 3.6.0 server   0 2 0 6000   With the introduction of the Learner Master feature in ZOOKEEPER-3140, tracking the state of the Observers synced with the voting quorum became more difficult from an operational perspective. Observers could now be synced with any voting member and not just the leader and to discover where an observer was being hosted required digging in to the server logs or complex jmx queries.

 

Add commands that externalize the state of observers from the point of view of the voting quorum.
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 5 days ago 0|z03ljs:
ZooKeeper ZOOKEEPER-3420

With newer ZK C Client and older ZK server, recv_buffer() could potentially return 0 continuously on non-blocking socket

Bug Open Major Unresolved Unassigned Suhas Dantkale Suhas Dantkale 07/Jun/19 21:09   02/Aug/19 08:14   3.5.3   c client   0 1 0 3600   With newer ZK C Client (3.5.*) and older ZK server(3.4.*), recv_buffer() could potentially return 0 continuously on non-blocking socket.

Following in the recv_buffer() snippet:-
Here, should the check be:
if (buff == &zh->primer_buffer && buff->curr_offset + rc == buff->len + sizeof(buff->len) - 1) ++rc;
instead of
if (buff == &zh->primer_buffer && rc == buff->len - 1) ++rc;

snippet :-

  if (buff->buffer) {
/* want off to now represent the offset into the buffer */
off -= sizeof(buff->len);

rc = recv(zh->fd, buff->buffer+off, buff->len-off, 0);

/* dirty hack to make new client work against old server
* old server sends 40 bytes to finish connection handshake,
* while we're expecting 41 (1 byte for read-only mode data) */
if (buff == &zh->primer_buffer && rc == buff->len - 1) ++rc; <====== Problem Line(?)

switch(rc) {
case 0:
errno = EHOSTDOWN;
case -1:
#ifdef _WIN32
if (WSAGetLastError() == WSAEWOULDBLOCK) {
#else
if (errno == EAGAIN) {
#endif
break;
}
return -1;
default:
buff->curr_offset += rc;
}
}
return buff->curr_offset == buff->len + sizeof(buff->len);


Probably the given code assumes that recv() operation will read in one go.
But on non-blocking socket, that assumption doesn't hold true.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
40 weeks, 5 days ago 0|z03jmg:
ZooKeeper ZOOKEEPER-3419

Backup and recovery support

New Feature Open Major Unresolved Michael Han Michael Han Michael Han 06/Jun/19 18:31   06/Jun/19 19:56   3.6.0   server   0 3   Historically ZooKeeper has no intrinsic support for backup and restore. The usual approach of doing backup and restore is through customized scripts to copy data around, or through some 3rd party tools (exhibitor, etc), which introduces operation burden. 

This Jira will introduce another option: a direct support of backup and restore from ZooKeeper itself. It's completely built into ZooKeeper, support point in time recovery of an entire tree rooted after an oops event, support recovery partial tree for test/dev purpose, and can help replay history for bug investigation. It will try to provide a generic interface so the backups can be directed to different data storage systems (S3, Kafka, HDFS, etc).

This same system has been in production at Twitter for X years and proved to be quite helpful for various use cases mentioned earlier. This will be a relative big patch, we'll try break the feature down and incrementally submit the patches when they are ready.
Twitter 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks ago 0|z03hzk:
ZooKeeper ZOOKEEPER-3418

Improve quorum throughput through eager ACL checks of requests on local servers

Improvement Resolved Major Fixed Michael Han Michael Han Michael Han 06/Jun/19 17:21   17/Nov/19 20:37 01/Aug/19 01:31 3.6.0 3.6.0 server   0 4 0 13800   Serving write requests that change the state of the system requires quorum operations, and in some cases, the quorum operations can be avoided if the requests are doomed to fail. ACL check failure is such a case. To optimize for this case, we elevate the ACL check logic and perform eager ACL check on local server (where the requests are received), and fail fast, before sending the requests to leader. 

As with any features, there is a feature flag that can control this feature on, or off (default). This feature is also forward compatible in that for new any new Op code (and some existing Op code we did not explicit check against), they will pass the check and (potentially) fail on leader side, instead of being prematurely filtered out on local server.

The end result is better throughput and stability of the quorum for certain workloads.
100% 100% 13800 0 Twitter, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks ago 0|z03huo:
ZooKeeper ZOOKEEPER-3417

add the new doc:zookeeperProtocols to introduce the implementation details of ZAB comparing with raft

New Feature Open Major Unresolved maoling maoling maoling 05/Jun/19 22:02   13/Nov/19 20:49   3.6.0   documentation   0 1 0 2400   100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks ago 0|z03gpk:
ZooKeeper ZOOKEEPER-3416

Remove redundant ServerCnxnFactoryAccessor

Improvement Resolved Minor Fixed Michael Han Michael Han Michael Han 05/Jun/19 15:54   07/Jun/19 14:47 07/Jun/19 08:13 3.6.0 3.6.0 tests   0 2 0 3600   We have two ways to access the private zkServer inside ServerCnxnFactory, and there is really no need to keep maintaining both. We could remove ServerCnxnFactoryAccessor when we added the public accessor for ServerCnxnFactory in ZOOKEEPER-1346, but we did not.

The solution is to consolidate all access of the zkServer through the public accessor of ServerCnxnFactory. The end result is cleaner code base and less confusion.

 

 
100% 100% 3600 0 Twitter, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
40 weeks, 6 days ago 0|z03gdc:
ZooKeeper ZOOKEEPER-3415

convert internal logic to use java 8 streams

Wish Open Trivial Unresolved Unassigned Brian Nixon Brian Nixon 05/Jun/19 13:39   28/Aug/19 14:18   3.6.0       0 4   There are a number of places in the code where for loops are used to perform basic filtering and collection. The java 8 stream api's make these operations much more polished. Since the master branch has been at this language level for a while, I'd wish for a (series of) refactor(s) to convert more of these loops to streams. newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 1 day ago 0|z03g5k:
ZooKeeper ZOOKEEPER-3414

sync api should throw NoNodeException when syncing a path which is not exist

Bug Open Minor Unresolved Rabi Kumar K C maoling maoling 05/Jun/19 05:28   17/Feb/20 12:15     3.7.0 java client   0 3 0 10800   [zk: 127.0.0.1:2180(CONNECTED) 0] sync /c1
Sync is OK
[zk: 127.0.0.1:2180(CONNECTED) 1] sync /c1dsafasdfasdfadsfasd
Node does not exist: /c1dsafasdfasdfadsfasd
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 1 day ago 0|z03fnk:
ZooKeeper ZOOKEEPER-3413

add a serialVersionUID for ClientCnxnLimitException to make compile no warning

Improvement Reopened Minor Unresolved Unassigned maoling maoling 05/Jun/19 04:47   14/Jun/19 08:09   3.6.0   java client   0 3 0 1800   build-generated:
[javac] Compiling 2 source files to /Users/wenba/workspaces/workspace_zookeeper/zookeeper/build/classes

compile:
[javac] Compiling 49 source files to /Users/wenba/workspaces/workspace_zookeeper/zookeeper/build/classes
[javac] /Users/wenba/workspaces/workspace_zookeeper/zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server/ClientCnxnLimitException.java:24: 警告: [serial] 可序列化类ClientCnxnLimitException没有 serialVersionUID 的定义
[javac] public class ClientCnxnLimitException extends Exception {
[javac] ^
[javac] 1 个警告
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks, 6 days ago 0|z03flc:
ZooKeeper ZOOKEEPER-3412

Bad signature of tarball 3.5.5 on EU mirror

Bug Resolved Major Cannot Reproduce Unassigned Mykola Rak Mykola Rak 03/Jun/19 07:39   06/Jun/19 09:57 06/Jun/19 09:57 3.5.5       0 1   I download files from EU mirror:
{code:java}
mrak@mrak:~/tmp/1$ ls
apache-zookeeper-3.5.5.tar.gz.asc KEYS zookeeper-3.5.5.tar.gz

{code}

try to verify signature:


{code:java}
mrak@mrak:~/tmp/1$ gpg --import ./KEYS
gpg: key E22A746A68E327C1: public key "Patrick Hunt (ZooKeeper release signing key) <phunt@apache.org>" imported
gpg: key 7C9476266E1CC7A4: public key "Benjamin Reed (CODE SIGNING KEY) <breed@apache.org>" imported
gpg: key 0DFF492D8EE2F25C: public key "Mahadev Konar (CODE SIGNING KEY) <mahadev@apache.org>" imported
gpg: key 93FB0254D2C80E32: public key "Flavio Junqueira (CODE SIGNING KEY) <fpj@apache.org>" imported
gpg: key C2C0FDE0820F225C: public key "Michi Mutsuzaki (CODE SIGNING KEY) <michim@apache.org>" imported
gpg: key BE3B6B9392BC2F2B: public key "Raul Gutierrez Segales <rgs@apache.org>" imported
gpg: key A1350C22ADAFD097: public key "Chris Nauroth (CODE SIGNING KEY) <cnauroth@apache.org>" imported
gpg: key F5CECB3CB5E9BD2D: "Rakesh Radhakrishnan (CODE SIGNING KEY) <rakeshr@apache.org>" not changed
gpg: key 59147497767E7473: "Michael Han (CODE SIGNING KEY) <hanm@apache.org>" not changed
gpg: key 15072ED241CF31A9: public key "Abraham Fine (CODE SIGNING KEY) <afine@apache.org>" imported
gpg: key BDB2011E173C31A2: 4 signatures not checked due to missing keys
gpg: key BDB2011E173C31A2: "Abraham Fine <abe@abrahamfine.com>" 3 new signatures
gpg: key FFE35B7F15DFA1BA: "Andor Molnar <andor@apache.org>" not changed
gpg: Total number processed: 12
gpg: imported: 8
gpg: unchanged: 3
gpg: new signatures: 3
gpg: no ultimately trusted keys found

{code}
verifying was failed with error BAD signature:
 
{code:java}
mrak@mrak:~/tmp/1$ gpg --verify ./apache-zookeeper-3.5.5.tar.gz.asc ./zookeeper-3.5.5.tar.gz
gpg: Signature made Fri 03 May 2019 02:08:41 PM CEST
gpg: using RSA key FFE35B7F15DFA1BA
gpg: BAD signature from "Andor Molnar <andor@apache.org>" [unknown]

{code}
 

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Important
41 weeks, 3 days ago 0|z03csw:
ZooKeeper ZOOKEEPER-3411

remove the deprecated CLI: ls2 and rmr

Improvement Resolved Minor Fixed Rabi Kumar K C maoling maoling 03/Jun/19 04:41   10/Jan/20 12:09 09/Jan/20 16:57 3.6.0 3.6.0, 3.7.0 scripts   0 3 0 4200   remove the deprecated CLI: *ls2* and *rmr*

Look at the discuss in the  [https://github.com/apache/zookeeper/pull/833]
100% 100% 4200 0 newbie, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
10 weeks ago 0|z03cl4:
ZooKeeper ZOOKEEPER-3410

./zkTxnLogToolkit.sh will throw the NPE and stop the process of formatting txn logs due to the data's content is null

Bug Resolved Minor Resolved maoling maoling maoling 01/Jun/19 09:33   11/Jul/19 22:04 11/Jul/19 22:04 3.6.0   scripts   0 2 0 5400   [zk: 127.0.0.1:2180(CONNECTED) 26] create -t 500 /ttl_node

19-5-30 下午06时10分50秒 session 0x10007a75c0c0000 cxid 0x0 zxid 0x6 createSession 30000
Exception in thread "main" java.lang.NullPointerException
at java.lang.String.<init>(String.java:566)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.getDataStrFromTxn(TxnLogToolkit.java:316)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.printTxn(TxnLogToolkit.java:272)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.printTxn(TxnLogToolkit.java:266)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.dump(TxnLogToolkit.java:217)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.main(TxnLogToolkit.java:116)

t*xnData.append(createTTLTxn.getPath() + "," + new String(createTTLTxn.getData()))*
*.append("," + createTTLTxn.getAcl() + "," + createTTLTxn.getParentCVersion())*
*.append("," + createTTLTxn.getTtl());*
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
36 weeks, 2 days ago 0|z03bq0:
ZooKeeper ZOOKEEPER-3409

ZOOKEEPER-3351 Compile Java code with -Xdoclint

Sub-task Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 31/May/19 11:46   03/Jun/19 17:56 03/Jun/19 13:41 3.6.0 3.6.0 build   0 2 0 12000   In order to drop ANT build script we have to compile with -Xdoclint option.

parent issue ZOOKEEPER-3351
100% 100% 12000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 3 days ago 0|z03au8:
ZooKeeper ZOOKEEPER-3408

Improve information about risks for forceSync config option

Improvement Open Minor Unresolved Unassigned Dmitry Konstantinov Dmitry Konstantinov 31/May/19 08:27   03/Jun/19 19:32       documentation   0 2   [https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#Unsafe+Options]

{quote}
The following options can be useful, but be careful when you use them. The risk of each is explained along with the explanation of what the variable does.
{quote}

{quote}
_forceSync_ : (Java system property: *zookeeper.forceSync*) Requires updates to be synced to media of the transaction log before finishing processing the update. If this option is set to no, ZooKeeper will not require updates to be synced to the media.
{quote}
The risks for this option are not very clear. Does it only the risk of loss some recent committed transactions if all Zookeeper instances from an ensemble are crashed and restarted at almost the same time or some other problems are also possible?

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 3 days ago 0|z03afs:
ZooKeeper ZOOKEEPER-3407

Update POM file with new information

Task Resolved Trivial Fixed Lars Francke Lars Francke Lars Francke 29/May/19 18:13   25/Jun/19 21:04 25/Jun/19 16:10   3.6.0     0 2 0 3600   New mailing lists & Jenkins update 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 1 day ago 0|z037i8:
ZooKeeper ZOOKEEPER-3406

Update website for new mailing lists

Task Resolved Minor Fixed Unassigned Lars Francke Lars Francke 29/May/19 18:08   31/May/19 07:22 31/May/19 07:13   3.6.0     0 1 0 7200   This updates the website to include information about issues@ and notifications@ 100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 6 days ago 0|z037hk:
ZooKeeper ZOOKEEPER-3405

owasp flagging jackson-databind

Bug Closed Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 28/May/19 17:21   16/Oct/19 14:58 30/May/19 09:56 3.6.0, 3.5.5 3.6.0, 3.5.6     0 1 0 3600   Owasp job is flagging jackson-databind for update:

CVE-2019-12086 CWE-200 Information Exposure Medium(5.0) jackson-databind-2.9.8.jar
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks ago 0|z035yo:
ZooKeeper ZOOKEEPER-3404

BouncyCastle upgrade to 1.61 might cause flaky test issues

Bug Closed Major Fixed Unassigned Andor Molnar Andor Molnar 27/May/19 16:48   16/Oct/19 14:59 28/May/19 10:08 3.6.0 3.6.0, 3.5.6 tests   0 2 0 3600   I've seen a lot of test timeout errors with QuorumSSL tests since I upgraded master to BouncyCastle 1.61 due to a Java 9 warning. The warning has been reported by [~eolivelli] which we tried to solve by the upgrade, but the warning message is still present so I don't see any harm in downgrading to the previous version. 

The timeout errors are very frequent with recent Java versions (11+) and quite rare with Java 8.

I think it's worth a try to downgrade and see if tests will be in a better shape.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks, 1 day ago 0|z034ig:
ZooKeeper ZOOKEEPER-3403

Zookeeper bundle 3.5.5 is Non OSGi complaint

Bug Open Major Unresolved Unassigned shriram shriram 27/May/19 06:32   27/May/19 07:29   3.5.5       0 2   Downloaded Zookeeper version 3.5.5 for the security alert *CVE-2019-0201*. Bundle is not OSGi complaint but version 3.4.14 released for the CVE is OSGi complaint.

 

*CVE reference:*

[https://seclists.org/oss-sec/2019/q2/119]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks, 3 days ago 0|z033zc:
ZooKeeper ZOOKEEPER-3402

Add a multiRead operation

Improvement Resolved Minor Fixed Peter Szecsi Peter Szecsi Peter Szecsi 26/May/19 19:32   26/Jun/19 08:55 26/Jun/19 04:58   3.6.0     0 3 0 25800   There is already an expressed need and desire from the community to support the multi version of the getData and getChildren operations. We could create a multiRead operation which behaves just like the multi, however, only accepts read operations. It would provide a common interface for batched read operations instead of different multi versions and the read operations could be even mixed in one request. 100% 100% 25800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 1 day ago 0|z033h4:
ZooKeeper ZOOKEEPER-3401

ZOOKEEPER-3245 Fix metric PROPOSAL_ACK_CREATION_LATENCY

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 23/May/19 13:47   08/Jul/19 17:30 28/May/19 09:56   3.6.0 metric system   0 2 0 6000   100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks, 1 day ago 0|z030c0:
ZooKeeper ZOOKEEPER-3400

Add documentation on local sessions

Improvement Resolved Major Fixed maoling Brian Nixon Brian Nixon 22/May/19 17:51   10/Oct/19 10:38 10/Oct/19 07:13 3.6.0, 3.5.6 3.6.0 documentation   0 3 0 10800   ZOOKEEPER-1147 added local sessions (client sessions not ratified by the leader) to ZooKeeper as a lightweight augmentation of the existing global sessions.

 

Add some outward facing documentation that describes this feature ([https://zookeeper.apache.org/doc/r3.5.5/zookeeperProgrammers.html#ch_zkSessions] seems like a reasonable place).
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
23 weeks ago 0|z02z5c:
ZooKeeper ZOOKEEPER-3399

Remove logging in getGlobalOutstandingLimit for optimal performance.

Bug Resolved Major Fixed Michael Han Michael Han Michael Han 21/May/19 21:42   25/May/19 16:27 25/May/19 13:46 3.6.0 3.6.0 server   0 3 0 3600   Recently we have moved some of our production clusters to the top of the trunk. One issue we found is a performance regression on read and write latency on the clusters where the quorum is also serving traffic. The average read latency increased by 50x, p99 read latency increased by 300x. 

The root cause is a log statement introduced in ZOOKEEPER-3177 (PR711), where we added a LOG.info statement in getGlobalOutstandingLimit. getGlobalOutstandingLimit is on the critical code path for request processing and for each request, it will be called twice (one at processing the packet, one at finalizing the request response). This not only degrades performance of the server, but also bloated the log file, when the QPS of a server is high.

This only impacts clusters when the quorum (leader + follower) is serving traffic. For clusters where only observers are serving traffic no impact is observed.

 

 
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks, 5 days ago 0|z02xq8:
ZooKeeper ZOOKEEPER-3398

Learner.connectToLeader() may take too long to time-out

Improvement Resolved Minor Fixed Vladimir Ivić Vladimir Ivić Vladimir Ivić 20/May/19 21:39   15/Jul/19 07:11 12/Jul/19 11:02   3.6.0 leaderElection, quorum   0 2 0 19200   After leader election happens, the followers will connect to the leader which is facilitated by the Learner.connectToLeader() method. 

Learner.connectToLeader() is relying on the initLimit configuration value to time-out in case the network connection is unreliable. This config may have a high value that could leave the ensemble retrying and waiting in the state of not having quorum for too long. The follower will retry up to 5 times. 

This patch introduces a new configuration directive that will allow Zookeeper to use different time-out value `connectToLeaderLimit` which then could be set to lower value than `initLimit`.

 
100% 100% 19200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
35 weeks, 6 days ago 0|z02vnk:
ZooKeeper ZOOKEEPER-3397

API docs are incorrectly linked in release artifacts

Bug Open Major Unresolved Unassigned Andor Molnar Andor Molnar 20/May/19 13:43   20/May/19 13:43   3.5.5   documentation   0 1   The generated API docs are now located in 2 different folders:
* zookeeper-server
* zookeeper-jute

But the navbar is still pointing to api/index.html which link is broken and needs to be fixed.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 3 days ago 0|z02v9k:
ZooKeeper ZOOKEEPER-3396

Flaky test in RestoreCommittedLogTest

Improvement Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 15/May/19 16:30   20/May/19 11:35 20/May/19 06:25 3.6.0 3.6.0 tests   0 2 0 3600   The patch for ZOOKEEPER-3244 ([https://github.com/apache/zookeeper/pull/770)] introduced a flaky test RestoreCommittedLogTest::testRestoreCommittedLogWithSnapSize.

 

Get it running consistently.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 3 days ago 0|z02qfc:
ZooKeeper ZOOKEEPER-3395

Document individual admin commands in markdown

Improvement Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 14/May/19 14:59   06/Aug/19 00:49 05/Aug/19 19:43 3.6.0, 3.5.6 3.6.0 documentation   0 2 0 4800   The "ZooKeeper Commands" section of the ZooKeeper Administrator's Guide takes time to document each four letter command individually but when it comes to the admin commands, it just directs the user to query a live peer in order to get the supported list (e.g. curl http://localhost:8080/commands). While such a query will provide the best source for the admin commands available on a given ZooKeeper version, it's not replacement for the role that the central guide provides.

Create an enumerated list of the supported admin commands in the section "The AdminServer" in the style that the four letter commands are documented in "The Four Letter Words".

 
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 2 days ago 0|z02omw:
ZooKeeper ZOOKEEPER-3394

Delay observer reconnect when all learner masters have been tried

Improvement Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 13/May/19 23:51   22/May/19 17:16 21/May/19 17:43 3.6.0 3.6.0 quorum   0 1 0 7200   Observers will disconnect when the voting peers perform a leader election and reconnect after. The delay zookeeper.observer.reconnectDelayMs was added to insulate the voting peers from the observers returning. With a large number of peers and the observerMaster feature active, this delay is mostly detrimental as it means that the observer is more likely to get hung up on connecting to a bad (down/corrupt) peer and it would be better off switching to a new one quickly.

To retain the protective virtue of the delay, it makes sense to add a delay that after all observer master's in the list have been tried before iterating through the list again. In the case where observer master's are not active, this degenerates to a delay between connection attempts on the leader.
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 2 days ago 0|z02nkg:
ZooKeeper ZOOKEEPER-3393

Read-only file system may make the whole ZooKeeper cluster to be unavailable.

Bug Open Major Unresolved Unassigned Jiafu Jiang Jiafu Jiang 13/May/19 21:41   13/May/19 21:41   3.4.12, 3.4.14   leaderElection, server   0 1   Say we have 3 nodes: zk1, zk2, and zk3, zk3 is the leader.

If the file system of the ZooKeeper data directory of the leader is read-only due to some hardware error, the leader will exit and begin a new election.

But the election will keep looping because the new leader may be zk3 again, but zk3 will fail to write epoch to disk due to read-only file system.

 

Since we have 3 nodes, if only one of them is in problem, should the ZooKeeper cluster be available? If the answer is yes, then we ought to fix this problem.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
44 weeks, 2 days ago 0|z02nh4:
ZooKeeper ZOOKEEPER-3392

Add admin command to display last snapshot information

Improvement Resolved Trivial Fixed Unassigned Brian Nixon Brian Nixon 13/May/19 19:08   27/May/19 09:18 27/May/19 04:56 3.6.0 3.6.0 server   0 2 0 3000   Basic systems to backup ZooKeeper data will maintain snapshot files of the data tree. In order to understand the health of these systems, they need a way to determine how in or out of date their files are to the current state in the ensemble.

Add an admin command that exposes the zxid and timestamp of the last saved/restored snapshot of the server. This will let such a backup system know when it can update and when it is stale.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks, 3 days ago 0|z02n94:
ZooKeeper ZOOKEEPER-3391

Drop unused CSVInputArchive and XMLInputArchive

Improvement Resolved Major Fixed Zili Chen Zili Chen Zili Chen 13/May/19 14:04   20/Jun/19 22:55 20/Jun/19 14:26   3.6.0 jute   0 2 0 4200   As described in

http://zookeeper-user.578899.n2.nabble.com/Deprecated-CSVInputArchive-and-XMLInputArchive-td7584086.html

these 2 input archives are not actively maintained and we probably don't have test coverage for them either, so keeping them in the codebase could be questionable.

So this is the ticket to track dropping unused CSVInputArchive and XMLInputArchive
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks ago 0|z02myw:
ZooKeeper ZOOKEEPER-3390

Broken Link - Release Notes (PDF)

Bug Open Trivial Unresolved Unassigned Edgar Pascoal Edgar Pascoal 13/May/19 09:53   13/May/19 09:53   3.5.4       0 1   Not possible to obtain the release notes in PDF format (broken link):

[https://zookeeper.apache.org/doc/r3.5.4-beta/releasenotes.pdf]

!image-2019-05-13-14-51-18-496.png|width=429,height=125!

 

!image-2019-05-13-14-49-14-464.png|width=528,height=267!
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
44 weeks, 3 days ago 0|z02mq0:
ZooKeeper ZOOKEEPER-3389

Zookeeper does not export all required packages in OSGi (needed for curator)

Bug Open Minor Unresolved Unassigned Jiri Ondrusek Jiri Ondrusek 13/May/19 06:37   26/Jun/19 07:42   3.4.10, 3.5.5 3.4.15     0 3 0 30600   Install Zookeeper and Curator (4.1+) in OSGi.
Some export packages are missing.
Problem could is happening on bot 3.4.x and 3.5.x.
100% 100% 30600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 1 day ago 0|z02mhs:
ZooKeeper ZOOKEEPER-3388

Allow client port to support plaintext and encrypted connections simultaneously

Improvement Closed Minor Fixed Unassigned Brian Nixon Brian Nixon 12/May/19 14:37   14/Feb/20 10:23 04/Jun/19 18:30 3.6.0 3.6.0, 3.5.7 server   0 2 0 16800   ZOOKEEPER-2125 extended the ZooKeeper server-side to handle encrypted client connections by allowing the server to open a second client port (the secure client port) to manage this new style of traffic. A server is able to handle plaintext and encrypted clients simultaneously by managing each on their respective ports.

When it comes time to get all clients connecting to your system to start using encryption, this approach requires that they make two changes simultaneously: altering their client properties to start use the secure settings and altering the routing information that they provide in order to know where to connect with the ensemble. If either is misconfigured then the client is cut off from the ensemble. With a large deployment of clients that are owned by a different teams and different tools, this presents a danger in activating the feature. Ideally, the two changes could be staggered so that first the encryption feature is activated and then the routing information is changed in a subsequent phase.

Allow the server connection factory managing the regular client port to handle both plaintext and encrypted connections. This will be independent of the operation of the server connection factory managing the secure client port but similar settings ought to apply to both (e.g. cipher suites) to maintain inter compatibility.
100% 100% 16800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
13 weeks, 3 days ago 0|z02m1k:
ZooKeeper ZOOKEEPER-3387

BufferedWriter -> FileWriter for reduced IO operations

Improvement Open Minor Unresolved Unassigned bd2019us bd2019us 11/May/19 14:43   29/May/19 10:22           0 1 0 3600   When FileWriter is used within a loop, the amount of IO operations can be reduced by replacing FileWriter with BufferedWriter. 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Patch
44 weeks, 5 days ago 0|z02lpk:
ZooKeeper ZOOKEEPER-3386

Add admin command to display voting view

Improvement Resolved Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 10/May/19 21:12   25/Jul/19 23:51 25/Jul/19 17:43 3.6.0 3.6.0 server   0 2 0 12000   Solid agreement on the set of voting servers is a necessity for ZooKeeper and it's useful to audit that agreement to validate it does not drift into some pathological condition.

 

Create an admin command that exposes the ensemble voting members from the point of view of the queried server.
100% 100% 12000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 6 days ago 0|z02lfc:
ZooKeeper ZOOKEEPER-3385

Add admin command to display leader

Improvement Resolved Trivial Fixed Unassigned Brian Nixon Brian Nixon 10/May/19 20:59   05/Jun/19 10:37 04/Jun/19 18:55 3.6.0 3.6.0 server   0 2 0 11400   Each QuorumPeer prints the identity of the server it believes is the leader in its logs but that is not easily turned into diagnostic information about the state of the ensemble. It can be useful in debugging various issues, both when a quorum is struggling to be established and when a minority of peers are failing to follow, to see at a glance which peers are following the leader elected by the majority and which peers are either not following or following a different server.

Create an admin command that exposes which server a peer believes is the current leader.
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 1 day ago 0|z02leo:
ZooKeeper ZOOKEEPER-3384

ZOOKEEPER-3451 Avoid long quorum unavailable time due to TLS connection close stalled with full send buffer

Sub-task Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 09/May/19 19:43   14/Dec/19 06:06     3.7.0 server   0 0 0 7200    

*Problem*

For SSL socket, when calling close(), it is required to send a close_notify alert before closing the write side of the connection. In case the leader is partitioned away, it's possible that the learner shutdown may take long time if the send buffer is full, because it will block on sending close_notify packet.

From the SSLSocketImpl implementation, it still honors the SO_LINGER socket option, the difference is that even we set the SO_LINGER time to be 0 it will still try to issue the close_notify packet. But it will fail immediately and close the socket if it failed to acquire the write lock immediately.

Set SO_LINGER to a small number will avoid stall for a long time during shutdown, this is what we're going to do here.

*Any Cons of doing this?*

From the TCP RFC, the close handshake is added to avoid a truncation attack where an attacker inserts into a message a TCP code indicating the message has finished, thus preventing the recipient picking up the rest of the message. But it's fine if the peer didn't send close_notify in some cases, for example, the client crashed or being killed, etc. For us, usually the close_notify won't be and don't have chance to send during rolling restart.

Another thing mentioned in the RFC is that not able to send close_notify will cause the SSL session not able to be resume. Given reusable session id is not benefiting ZooKeeper quorum anyway, this is not a problem for us.
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks ago 0|z02k1s:
ZooKeeper ZOOKEEPER-3383

ZOOKEEPER-3245 Improve prep processor metric accuracy and de-flaky unit test

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 09/May/19 16:53   10/May/19 09:18 10/May/19 06:22   3.6.0 metric system   0 2 0 3000   100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
44 weeks, 6 days ago 0|z02jwo:
ZooKeeper ZOOKEEPER-3382

Update Documentation: If you only have one storage device

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 09/May/19 14:57   29/May/19 18:52 29/May/19 11:14 3.6.0, 3.5.5, 3.4.15 3.6.0 documentation   0 2 0 1800   {quote}
If you only have one storage device, put trace files on NFS and increase the snapshotCount; it doesn't eliminate the problem, but it should mitigate it.
{quote}

'trace files' are no longer available in ZooKeeper; remove mention of it. Also, there is no such configuration named 'snapshotCount'; update it.
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks, 1 day ago 0|z02ju0:
ZooKeeper ZOOKEEPER-3381

Add multi watchregistration support for multiRead operation

Improvement Open Minor Unresolved Peter Szecsi Peter Szecsi Peter Szecsi 08/May/19 09:32   14/Jun/19 20:42   3.6.0   java client, tests   0 1   Currently, the client API only supports to register one watch attached to one node for a single request. However, for complete support of the multi version of the {{GetChildren}} this functionality needs to be extended. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 1 day ago 0|z02i08:
ZooKeeper ZOOKEEPER-3380

A revist to the quota mechanism

Improvement Open Major Unresolved Unassigned maoling maoling 07/May/19 23:24   14/Dec/19 06:08     3.7.0 server   0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 1 day ago 0|z02hk0:
ZooKeeper ZOOKEEPER-3379

ZOOKEEPER-3245 De-flaky test in Quorum Packet Metrics

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 07/May/19 17:26   27/May/19 22:21 27/May/19 16:34   3.6.0 metric system   0 2 0 18600   100% 100% 18600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
42 weeks, 2 days ago 0|z02hbk:
ZooKeeper ZOOKEEPER-3378

Set the quorum cnxn timeout independently from syncLimit

Improvement Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 07/May/19 17:18   20/May/19 22:36 20/May/19 17:15   3.6.0 quorum   0 2 0 1200   If an ensemble requires a high sync limit to support a large data tree or transaction rate, it can cause the QuorumCxnManager to hang over-long in response to quorum events. Using the sync limit for this timeout is a convenience in terms of keeping all failure detection mechanisms in sync but it is not strictly required for correct behavior. 100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 2 days ago 0|z02haw:
ZooKeeper ZOOKEEPER-3377

Add a new CLI:exit

New Feature Resolved Minor Invalid maoling maoling maoling 07/May/19 05:29   07/May/19 22:37 07/May/19 22:36 3.6.0   scripts   0 1   [zk: 127.0.0.1:2180(CONNECTED) 0] exit                                                                                     ~bin 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 1 day ago 0|z02ggo:
ZooKeeper ZOOKEEPER-3376

Create a Maven module for Metrics Providers API

Improvement In Progress Major Unresolved Enrico Olivelli Enrico Olivelli Enrico Olivelli 01/May/19 07:39   14/Dec/19 06:08   3.6.0 3.7.0 build, metric system   0 1   Once we get rid of Ant build we can package the Metrics Provider APIs in a separate module.

This way Providers won't need to depend on ZooKeper Server module and we will have a better structure
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
46 weeks, 1 day ago 0|z02aww:
ZooKeeper ZOOKEEPER-3375

ZOOKEEPER-3451 Docs enhancements for 3.5 release

Sub-task Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 30/Apr/19 11:59   01/Jul/19 10:53 03/May/19 07:36 3.5.4 3.5.5 documentation   0 1 0 2400   Add to release notes and README:
* From 3.5.5 release if ZooKeeper is built with Java 8, than Java 8_u211+ should be used.

Add to Quorum TLS docs:
* Certificates should be generate on a per machine basis, not per ZK instance

 
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 6 days ago 0|z02a3k:
ZooKeeper ZOOKEEPER-3374

Use MultiRead operation in the BFS utility

Improvement Open Minor Unresolved Peter Szecsi Peter Szecsi Peter Szecsi 29/Apr/19 08:41   14/Jun/19 20:45   3.6.0   java client 30/Apr/19 0 1 86400 86400 0% The {{multiRead}} operation allows us to traverse a tree using fewer requests to the server (by using batched {{getChildren}} operations). At the moment, the number of requests for traversal is the same as the number of nodes, however, this can be changed to only use up as many requests as the height of the tree.

Currently, {{listSubTreeBFS}} utility function used for the {{deleteAll}} command and this improvement makes it more robust, especially in high latency setups.
0% 0% 86400 86400 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
46 weeks, 3 days ago 0|z028jc:
ZooKeeper ZOOKEEPER-3373

need change description for "Single System Image" guarantee in document

Bug Resolved Minor Fixed Jiahongchao Jiahongchao Jiahongchao 28/Apr/19 05:19   06/Aug/19 00:49 05/Aug/19 19:51 3.4.14 3.6.0 documentation   0 2 0 6600   In website, "Single System Image" is "A client will see the same view of the service regardless of the server that it connects to."

I want to change it to "Once connected, a client will see the same view of the service even if it switchs to another server"

Because the old one is a little misleading, if cluster has a outdated follower and a normal follower, I not think a client will see the same view of the service regardless of the server that it connects to at its first connection.
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 2 days ago 0|z027m8:
ZooKeeper ZOOKEEPER-3372

Cleanup pom.xml in order to let Maven clients import as few dependencies as possible

Improvement Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 27/Apr/19 17:43   20/May/19 13:50 29/Apr/19 07:49 3.6.0, 3.5.5 3.6.0, 3.5.5 java client   0 2 0 8400   ZooKeeper client application imports a lot of third party dependencies that are automatically applied to maven client applications, that is applications that are using the 'client'.

This task is to clean up the final resulting pom of the main artifact consumed by "clients" as much as possible.

 
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
46 weeks, 3 days ago 0|z027f4:
ZooKeeper ZOOKEEPER-3371

Port unification for admin server

New Feature Resolved Major Fixed Eric Lee Eric Lee Eric Lee 22/Apr/19 18:58   29/Jul/19 15:36 29/Jul/19 11:09 3.6.0 3.6.0 security   0 3 0 18600   This issue provides the Jetty admin server with port unification, meaning both secure and insecure connections can be established on the same port. By default, this feature is disable. It can be enabled by passing "zookeeper.admin.portUnification" as a command line argument. 100% 100% 18600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 3 days ago 0|z0211k:
ZooKeeper ZOOKEEPER-3370

Remove SVN specific revision generation

Improvement Closed Major Fixed Zili Chen Zili Chen Zili Chen 18/Apr/19 17:18   16/Oct/19 14:58 01/Jul/19 09:17   3.6.0, 3.5.6 build   0 2 0 8400   Continue the SVN to Git Port 100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 3 days ago 0|z01xxc:
ZooKeeper ZOOKEEPER-3369

Maven release artifacts cleanup

Improvement Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 18/Apr/19 05:06   20/May/19 13:50 25/Apr/19 08:56 3.6.0, 3.5.5 3.6.0, 3.5.5 build   0 2 0 7200   - Change source tarball name from
zookeeper-3.5.5-source-package.tar.gz
to
apache-zookeeper-3.5.5.tar.gz

- After unpacking the tarballs the top level dir should match the archive's name.

- missing api docs after mvn clean install and from binary package. Fix readme on this:
"Full documentation for this release can also be found in docs/index.html"
Perhaps we should have a general README and a binary specific readme?

- Correct "The release artifact contains the following jar files in
"dist-maven" directory" - no dist-maven dir anymore.

- Include license file for netty-all-4.1.29.Final.jar
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
47 weeks ago 0|z01wyo:
ZooKeeper ZOOKEEPER-3368

Add a new cli:version

New Feature Resolved Minor Duplicate maoling maoling maoling 17/Apr/19 22:28   04/May/19 23:24 04/May/19 23:24 3.6.0       0 1 0 7800   [zk: 127.0.0.1:2180(CONNECTED) 0] version
ZooKeeper version: 3.6.0-SNAPSHOT-29f9b2c1c0e832081f94d59a6b88709c5f1bb3ca, built on 04/16/2019 09:16 GMT
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 4 days ago 0|z01wjc:
ZooKeeper ZOOKEEPER-3367

Zookeeper 3.4.14 Maven jar pulls in spotbugs-annotations, which is under LGPL license

Bug Open Major Unresolved Unassigned Stig Rohde Døssing Stig Rohde Døssing 17/Apr/19 06:43   17/Apr/19 06:43   3.4.14       0 2   Pulling in Zookeeper 3.4.14 in a Maven build results in spotbugs-annotations also being pulled in as a dependency.

{quote}
[INFO] \- org.apache.zookeeper:zookeeper:jar:3.4.14:compile
[INFO] +- com.github.spotbugs:spotbugs-annotations:jar:3.1.9:compile
{quote}

Since spotbugs-annotations is under LGPL license, it would ideally be used only during the build, and not be pulled in when users depend on Zookeeper.
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 1 day ago 0|z01vk0:
ZooKeeper ZOOKEEPER-3366

ZOOKEEPER-3092 Pluggable metrics system for ZooKeeper - move remaining metrics to MetricsProvider

Sub-task Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 16/Apr/19 04:43   04/Jun/19 10:01 04/Jun/19 04:55 3.6.0 3.6.0 metric system   0 2 0 21600   There are a bunch of metrics exposed by the Monitor Command which are not implemented using ServerMetrics, we have to move all of them to ServerMetrics, or at least move it to the new metrics framework 100% 100% 21600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 2 days ago 0|z01tsg:
ZooKeeper ZOOKEEPER-3365

Use Concurrent HashMap in NettyServerCnxnFactory

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 13/Apr/19 23:00   26/Jun/19 18:23 24/Jun/19 05:07 3.6.0 3.6.0 server   0 3 0 13200   {code:java|title=NettyServerCnxnFactory.java}
// Access to ipMap or to any Set contained in the map needs to be
// protected with synchronized (ipMap) { ... }
private final Map<InetAddress, Set<NettyServerCnxn>> ipMap = new HashMap<>();

private void addCnxn(NettyServerCnxn cnxn) {
cnxns.add(cnxn);
synchronized (ipMap){
InetAddress addr =
((InetSocketAddress)cnxn.getChannel().remoteAddress()).getAddress();
Set<NettyServerCnxn> s = ipMap.get(addr);
if (s == null) {
s = new HashSet<>();
ipMap.put(addr, s);
}
s.add(cnxn);
}
}
{code}

This can be done better (less code, less contention) with Java 8 Map API. Although, as I look at this, the only thing this is used for is a count of the number of connections from each address. Maybe this should just store a count instead of a collection.

https://github.com/apache/zookeeper/blob/f69ad1b0fed88da3c1b67fd73031e7248c0564f7/zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java

Also note that an exclusive lock is required with each interaction of the table. By moving to a {{ConcurrentHashMap}}:

bq. Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove).

https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html

Removing this lock should improve ZK's performance for highly concurrent client workloads, especially since its Async Netty operations, unless of course there are other locks elsewhere.
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 3 days ago 0|z01rh4:
ZooKeeper ZOOKEEPER-3364

Compile with strict options in order to check code quality

Improvement Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 13/Apr/19 16:12   22/May/19 01:06 08/May/19 15:55 3.6.0 3.6.0 build   0 2 0 28200   In order to dismiss old QA tests based on ant (ZOOKEEPER-3351) we have to enforce code quality by activating some falgs on javac at build time, namely:

 
{code:java}
<compilerArgs>
   <compilerArg>-Werror</compilerArg>
   <compilerArg>-Xlint:deprecation</compilerArg>
   <compilerArg>-Xlint:unchecked</compilerArg>
   <!-- https://issues.apache.org/jira/browse/MCOMPILER-205 -->
   <compilerArg>-Xpkginfo:always</compilerArg>
</compilerArgs>{code}
100% 100% 28200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks ago 0|z01rco:
ZooKeeper ZOOKEEPER-3363

Drop ant basedbuild umbrella issue

Task Open Major Unresolved Enrico Olivelli Enrico Olivelli Enrico Olivelli 13/Apr/19 04:40   14/Dec/19 06:07     3.7.0 build   0 2 0 4800   This is an umbrella issue to track activites related to dropping ant based build now that we have (since 3.5.5) Maven fully working. 100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
17 weeks, 6 days ago 0|z01r4o:
ZooKeeper ZOOKEEPER-3362

Create a simple checkstyle file

Task Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 12/Apr/19 15:17   16/Oct/19 14:59 08/May/19 11:51 3.6.0 3.6.0, 3.5.6 build   1 2 0 17400   Create a basic checkstyle file, in order to cover the minimal check on @author tags.

This is needed in order to drop old ANT based precommit job (see ZOOKEEPER-3351)

We will not remove legacy checkstyle configuration file in zookeeper-server/src/test/resources/checkstyle.xml because it is referred by ANT build.xml files (even if we are not actually using that target).

This task won't add a complete checkstyle configuration with usual checks because it would imply almost a change at every .java in the codebase.
100% 100% 17400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 1 day ago 0|z01qp4:
ZooKeeper ZOOKEEPER-3361

ZOOKEEPER-1407 Add multi version of getChildren request

Sub-task Resolved Minor Won't Do Peter Szecsi Peter Szecsi Peter Szecsi 12/Apr/19 08:40   01/Aug/19 16:18 01/Aug/19 16:18     java client, server, tests   0 2 1209600 1193400 16200 1% There is already a multi operation for \{{delete, create, setData}}... However, \{{getChildren}} has no version of getting the children of multiple nodes by one message.
This could heavily improve the efficiency of a traversal (e.g. breadth-first search) when the latency is high (>1ms). In this case, a simple \{{deleteAll}} algorithm on 10k nodes takes at least (1ms * 10000 * 2 =) 20 sec, only to acquire the list of the nodes selected for deleting (it has to check for every node whether it has children or not).
I would add a version of \{{getChildren}} function to the ZooKeeper API which accepts lists as well (containing node paths) and returns their children and introduce a new request type. This way the backward compabilty would not be hurt but ZK could provide a more robust solution for those who may have latency issues.
1% 1% 16200 1193400 1209600 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 6 days ago ZOOKEEPER-3402 already implemented a more general approach which solves this issue as well. 0|z01q7k:
ZooKeeper ZOOKEEPER-3360

Misprint in WriteLock javadoc

Improvement Resolved Trivial Fixed Unassigned Igor Rudenko Igor Rudenko 10/Apr/19 16:24   23/Jan/20 13:17 03/May/19 11:28 3.5.5 3.6.0 recipes   0 3 0 3000   Any
{code:java}
* @param acls the acls that you want to use for all the paths,
{code}

{code:java}
public WriteLock(ZooKeeper zookeeper, String dir, List<ACL> acl){code}
100% 100% 3000 0 newbie, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 6 days ago 0|z01nu0:
ZooKeeper ZOOKEEPER-3359

Batch commits in the CommitProcessor

Improvement Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 10/Apr/19 15:22   30/Jul/19 00:27 29/Jul/19 20:31 3.6.0 3.6.0 quorum   0 2 0 19800   Draining a single commit every time the CommitProcessor switches to commit mode can add to the backlog of committed messages. Instead, add controls to batch and drain multiple commits and limit the number of reads being served. Improves commit throughput and adds backpressure on reads. 100% 100% 19800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 2 days ago 0|z01nq8:
ZooKeeper ZOOKEEPER-3358

Make Snappy The Default Snapshot Compression Algorithm

Improvement Open Major Unresolved Unassigned David Mollitor David Mollitor 10/Apr/19 10:20   14/Dec/19 06:06   3.6.0 3.7.0 server   0 2   Now that SnapShots are compressed, thanks to [ZOOKEEPER-3179], make the default algorithm Snappy compression. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks ago 0|z01n4o:
ZooKeeper ZOOKEEPER-3357

Remove Dead Link from ZooKeeper Programmer's Guide

Improvement Open Trivial Unresolved David Mollitor David Mollitor David Mollitor 09/Apr/19 13:39   05/Feb/20 07:16   3.5.4, 3.6.0 3.7.0, 3.5.8 documentation   0 2 0 5400   Remove dead link to _ZooKeeper Talk at the Hadoup Summit 2008_ 100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 6 days ago 0|z01lyw:
ZooKeeper ZOOKEEPER-3356

Request throttling in Netty is not working as expected and could cause direct buffer OOM issue

Bug Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 09/Apr/19 12:20   24/Jul/19 15:45 23/Jul/19 09:56 3.5.4, 3.6.0 3.6.0 server   0 3 0 10800   The current implementation of Netty enable/disable recv logic may cause the direct buffer OOM because we may enable read a large chunk of packets and disabled again after consuming a single ZK request. We have seen this problem on prod occasionally.
 
Need a more advanced flow control in Netty instead of using AUTO_READ. Have improved it internally by enable/disable recv based on the queuedBuffer size, will upstream this soon.
 
With this implementation, the max Netty queued buffer size (direct memory usage) will be 2 * recv_buffer size. It's not the per message size because in epoll ET mode it will try to read until the socket is empty, and because of SslHandler will trigger another read when it's not a full encrypt packet and haven't issued any decrypt message.
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
34 weeks, 1 day ago 0|z01luo:
ZooKeeper ZOOKEEPER-3355

Remove 'tbd' From Docs

Improvement Open Trivial Unresolved David Mollitor David Mollitor David Mollitor 09/Apr/19 11:56   05/Feb/20 07:15   3.5.4, 3.6.0 3.7.0, 3.5.8 documentation   0 3 0 4200   For years, there have been lots of 'tbd' recorded in the documentation. It does not look very polished and there is no one working on these. I think it's time to finally remove them. 100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks ago 0|z01lu8:
ZooKeeper ZOOKEEPER-3354

Improve efficiency of DeleteAllCommand

Improvement Resolved Trivial Fixed Unassigned Brian Nixon Brian Nixon 09/Apr/19 01:34   05/Jun/19 23:42 05/Jun/19 18:02 3.6.0 3.6.0 other   0 2 0 10200   The cli DeleteAllCommand internally uses a synchronous iterative formula. This can be improved with batching for quicker response time on large subtrees. 100% 100% 10200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks ago 0|z01l00:
ZooKeeper ZOOKEEPER-3353

Admin commands for showing initial settings

Improvement Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 09/Apr/19 01:20   15/May/19 10:40 15/May/19 05:57 3.6.0 3.6.0 server   0 2 0 9600   It can be useful as a sysadmin to know the settings that were initially used to configure a given ZooKeeper server. Some of these can be read from the process logs and others from the java args in the process description but if, for example, the zoo.cfg file used when starting a process up is overwritten without the process itself being restarted then it can be difficult to know exactly what is currently being run on the jvm.

Produce admin commands (and four-letter commands) to answer these questions.
100% 100% 9600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
44 weeks, 1 day ago 0|z01kzk:
ZooKeeper ZOOKEEPER-3352

Use LevelDB For Backend

New Feature Open Critical Unresolved David Mollitor David Mollitor David Mollitor 08/Apr/19 16:58   09/May/19 17:23     4.0.0 server   0 5   Use LevelDB for managing data stored in ZK (transaction logs and snapshots).

https://stackoverflow.com/questions/6779669/does-leveldb-support-java
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks ago 0|z01klk:
ZooKeeper ZOOKEEPER-3351

Migrate qa-test-pullrequest ant task to maven

Improvement Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 08/Apr/19 16:31   16/Jun/19 06:51 16/Jun/19 06:51 3.5.5 3.6.0 build   0 4 0 8400   ZOOKEEPER-3409 In order to drop ANT we have to migrate task qa-test-pullrequest to Maven.

That task is currently called this way in ASF Jenkins:
{code:java}
#!/bin/bash
set +x

#export JAVA_HOME=/home/jenkins/tools/java/jdk1.7.0-64
export ANT_HOME=/home/jenkins/tools/ant/apache-ant-1.9.9

#export PATH=$PATH:$JAVA_HOME/bin:$ANT_HOME/bin:
export PATH=$PATH:$LATEST1_8_HOME/bin:$ANT_HOME/bin:

export PATCH_DIR=${WORKSPACE}/patchprocess
if [ ! -e "$PATCH_DIR" ] ; then
    mkdir -p $PATCH_DIR
fi

pwd
git status
git rev-parse HEAD

which java
java -version
ulimit -a

env

${ANT_HOME}/bin/ant \
        -Dpatch.file=foobar \
        -Dscratch.dir=$PATCH_DIR \
        -Dps.cmd=/bin/ps \
        -Dwget.cmd=/usr/bin/wget \
        -Djiracli.cmd=/home/jenkins/tools/jiracli/latest/jira.sh \
        -Dgit.cmd=/usr/bin/git \
        -Dgrep.cmd=/bin/grep \
        -Dpatch.cmd=/usr/bin/patch \
        -Dfindbugs.home=/home/jenkins/tools/findbugs/latest/ \
        -Dforrest.home=/home/jenkins/tools/forrest/latest/ \
        -Djira.passwd=xxxxxxxx \
        -Djava5.home=/home/jenkins/tools/java5/latest/ \
        -Dcurl.cmd=/usr/bin/curl \
        -Dtest.junit.maxmem=2g \
        qa-test-pullrequest{code}
100% 100% 20400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 5 days ago ant precommit is not used anymore on master branch.
It is active on branch-3.5 and branch-3.4
0|z01kk8:
ZooKeeper ZOOKEEPER-3350

Get rid of CommonNames

Improvement Resolved Major Fixed Zili Chen Zili Chen Zili Chen 08/Apr/19 05:04   03/May/19 17:46 03/May/19 11:39 3.5.4 3.6.0 jmx   0 3 0 1800   Inside {{CommonNames}} it says {{TODO: will get rid of it eventually.}}.

However, I don't see the reason of such removal and since it came from over ten years ago I'd like to know whether it is still valid.

cc [~phunt] [~anmolnar]
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 6 days ago 0|z01jr4:
ZooKeeper ZOOKEEPER-3349

QuorumCnxManager socketTimeout unused

New Feature Resolved Trivial Not A Problem Brian Nixon Brian Nixon Brian Nixon 04/Apr/19 18:55   19/Dec/19 17:59 20/May/19 19:07 3.6.0   quorum   0 1 0 8400   QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the class. It's clear from the context that it should either be removed entirely or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit can be changed by jmx, I'm thinking that the former is the better solution.

 
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 3 days ago 0|z01hgo:
ZooKeeper ZOOKEEPER-3348

Make TxnLog and TxnLog Iterator Closable

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 04/Apr/19 09:49   12/Apr/19 23:01 12/Apr/19 13:07   3.6.0 server   0 2 0 2400   [https://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html] 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 5 days ago 0|z01gl4:
ZooKeeper ZOOKEEPER-3347

Improve PathTrie Consistency

Improvement Resolved Major Fixed David Mollitor David Mollitor David Mollitor 02/Apr/19 17:33   09/Aug/19 16:17 09/Aug/19 10:15   3.6.0 server   0 2 0 13800   There is a bunch of synchronization that occurs in the {{PathTrie}}. Each node in the tree requires a lock to view its children, so to traverse a tree that is 8 nodes deep, it is required to lock 8 different times. Also, I'm not really sure that the locking is consistent; a node deep in the tree can be negatively impacted by another thread deleting the node's parent at the same time. 100% 100% 13800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
31 weeks, 6 days ago 0|z01dw0:
ZooKeeper ZOOKEEPER-3344

write a new script:zkSnapShotToolkit.sh to encapsulate SnapshotFormatter and doc the usage

New Feature Resolved Major Fixed maoling maoling maoling 30/Mar/19 04:52   06/Aug/19 11:41 06/Aug/19 06:57 3.6.0 3.6.0 scripts   0 2 0 3000   write a new script:*zkSnapShotToolkit.sh* to encapsulate *SnapshotFormatter.java*
just like*: zkTxnLogToolkit.sh* for the users' convenience.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 2 days ago 0|z019q0:
ZooKeeper ZOOKEEPER-3343

Add a new doc: zookeeperTools.md

New Feature Resolved Major Fixed maoling maoling maoling 30/Mar/19 04:26   03/May/19 17:46 03/May/19 11:08 3.5.4 3.6.0 documentation   0 3 0 3000   write zookeeper tools[3.7], which includes the:

  - list all usages of the shells under the zookeeper/bin. (e.g zkTxnLogToolkit.sh,zkCleanup.sh)

  - benchmark tool

  - backup tool

  - test tools:jepsen
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 6 days ago 0|z019ps:
ZooKeeper ZOOKEEPER-3342

Use StandardCharsets

Improvement Open Major Unresolved David Mollitor David Mollitor David Mollitor 28/Mar/19 21:27   22/Jan/20 09:14     3.7.0 server   0 2 0 10200   {quote}
Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array. The behavior of this method when this string cannot be encoded in the default charset is unspecified.

https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes--
{quote}

Since this is a distributed system, it is always possible that different nodes have different default charsets defined. I think it's most safe to specify it explicitly across all nodes for safety sake. You could for example see a situation where an upgrade JVM uses a different default and during a rolling upgrade of the JVM, different nodes now have a different default.

* The default charset is usually "ISO-8859-1". UTF-8 covers more of our international friends.
* Explicitly specifying the CharSet yields slight performance gains
* Explicitly specifying the CharSet removes the need for try/catch blocks of UnsupportedEncodingException


https://blog.codecentric.de/en/2014/04/faster-cleaner-code-since-java-7/
100% 100% 10200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
38 weeks, 1 day ago 0|z018a8:
ZooKeeper ZOOKEEPER-3341

Remove Superfluous ByteBuffer Duplicate

Improvement Resolved Trivial Fixed David Mollitor David Mollitor David Mollitor 28/Mar/19 17:03   09/Apr/19 20:27 09/Apr/19 10:08   3.6.0 server   0 2 0 4800   {code:java|title=QuorumCnxManager.java}
byte[] msgArray = new byte[length];
din.readFully(msgArray, 0, length);
ByteBuffer message = ByteBuffer.wrap(msgArray);
addToRecvQueue(new Message(message.duplicate(), sid));
{code}

The {{message}} is being duplicated and the original is GC'ed. Just pass the {{message}}; do not bother with making a duplicate. I think this is a copy+paste bug.

https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L1195-L1198
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
49 weeks, 2 days ago 0|z0181k:
ZooKeeper ZOOKEEPER-3340

Introduce CircularBlockingQueue in QuorumCnxManager.java

Improvement Resolved Major Fixed David Mollitor David Mollitor David Mollitor 28/Mar/19 16:48   13/Nov/19 16:44 13/Nov/19 14:42   3.6.0 server   0 2 0 13800   I was recently profiling a on a ZK Quorum Leader in a low-volume environment and noticed that most of its time was spent in {{QuorumCnxManager#RecvWorker}}. Nothing wrong with that, but it did draw my attention to it. I noticed that {{Queue}} interactions are a bit... verbose. I would like to propose that we streamline this area of the code.
 

[https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L1291-L1309]


This proposed JIRA should not be viewed simply as 'ArrayBlockingQueue' v.s. 'CircularBlockingQueue'.

One of the things that this PR does is remove the need for double-locking. For example in addToRecvQueue the following condition exists:
{code}
public void addToRecvQueue(Message msg) {
synchronized(recvQLock) {
if (recvQueue.remainingCapacity() == 0) {
try {
{code}

From here it can be observed that there are two locks obtained: {{recvQLock}} and the one internal to {{recvQueue}}. This is required because there are multiple interactions that this Manager wants to do on the queue in a serialized way. The {{CircularBlockingQueue}} performs all of those actions on behalf of the caller, but it does it internal to the queue, under a single lock,... the one internal to {{CircularBlockingQueue}}.

The current code also has some race-conditions that are simply ignored when they happen. The race conditions are detailed nicely in the code comments here. However, the changes in this PR directly deal with, and eliminate, these race conditions altogether since all actions that work against the {{CircularBlockingQueue}} all occur within its internal locks. This greatly simplifies the code and removes the need for new folks to learn this nuance of "why is the code doing this."
100% 100% 13800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
18 weeks, 1 day ago 0|z0180w:
ZooKeeper ZOOKEEPER-3339

Improve Debug and Trace Log Statements

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 28/Mar/19 10:45   03/Aug/19 06:16 03/Aug/19 02:43   3.6.0 server   0 2 0 14400   SLF4J supports an advanced feature called parameterized logging which can significantly boost logging performance for disabled logging statement. Review all logging and ensure that it adheres to parameterized logging.

https://www.slf4j.org/faq.html#logging_performance
100% 100% 14400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
32 weeks, 5 days ago 0|z017k8:
ZooKeeper ZOOKEEPER-3338

Review of BufferStats Class

Improvement Open Trivial Unresolved David Mollitor David Mollitor David Mollitor 28/Mar/19 09:02   14/Dec/19 06:06     3.7.0 server   0 1 0 2400   * Faster to use StringBuilder than String Format
* Tidy up
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
51 weeks ago 0|z017dk:
ZooKeeper ZOOKEEPER-3337

Maven build failed with user or group id is too big

Bug Patch Available Major Unresolved Andrew Kyle Purtell Andrew Kyle Purtell Andrew Kyle Purtell 27/Mar/19 16:44   09/Apr/19 05:47   3.4.13       0 5   Maven assembly plugin configuration must specify tarLongFileMode of "posix", not "gnu".

Otherwise if the user or group id is too large the build will fail. For example:
{noformat}
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single
(source-package) on project zookeeper: Execution source-package of goal
org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single failed: user id '1754762210'
is too big ( > 2097151 ). -> [Help 1]
{noformat}
A very common problem, many other projects here have had to fix this. 
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
49 weeks, 2 days ago 0|z016e0:
ZooKeeper ZOOKEEPER-3336

Leader election terminated, two leaders or not following leader or not having state

Bug Open Major Unresolved Unassigned Simin Oraee Simin Oraee 27/Mar/19 08:36   10/May/19 06:00   3.4.13   leaderElection   0 3   Debian, Java 8 I am working on a testing tool for distributed systems. I tested Zookeeper, enforcing different possible orderings of events. I encountered some inconsistencies in the election of the leader. Here are the logs of 3 completed executions.

I am wondering if these behaviors are expected or not.

1) More than one node consider themselves leaders:
NodeCrashEvent\{id=1, nodeId=0}
NodeStartEvent\{id=7, nodeId=0}
MessageEvent\{id=8, predecessors=[7], from=0, to=0, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=9, predecessors=[8, 7], from=0, to=1, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=10, predecessors=[9, 7], from=0, to=2, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=5, predecessors=[], from=1, to=0, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=12, predecessors=[5, 10, 7], from=0, to=0, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=13, predecessors=[12, 5, 7], from=0, to=1, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=14, predecessors=[5, 13, 7], from=0, to=2, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=11, predecessors=[5], from=1, to=1, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=15, predecessors=[11], from=1, to=2, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=6, predecessors=[], from=2, to=0, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
Node 1 state: LEADING
Node 1 final vote: Vote\{leader=1, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=17, predecessors=[6, 14, 7], from=0, to=0, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=18, predecessors=[17, 6, 7], from=0, to=1, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=19, predecessors=[18, 6, 7], from=0, to=2, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=20, predecessors=[18], from=1, to=0, leader=1, state=LEADING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=16, predecessors=[6], from=2, to=1, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=22, predecessors=[16, 20], from=1, to=2, leader=1, state=LEADING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=21, predecessors=[16], from=2, to=2, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
Node 0 state: FOLLOWING
Node 0 final vote: Vote\{leader=2, zxid=0, electionEpoch=1, peerEpoch=0}
Node 2 state: LEADING
Node 2 final vote: Vote\{leader=2, zxid=0, electionEpoch=1, peerEpoch=0}

2) There are some nodes that follow nodes other than the leaders:
NodeCrashEvent\{id=1, nodeId=0}
NodeStartEvent\{id=7, nodeId=0}
MessageEvent\{id=8, predecessors=[7], from=0, to=0, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=9, predecessors=[8, 7], from=0, to=1, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=10, predecessors=[9, 7], from=0, to=2, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=5, predecessors=[], from=1, to=0, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=12, predecessors=[5, 10, 7], from=0, to=0, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=13, predecessors=[12, 5, 7], from=0, to=1, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=14, predecessors=[5, 13, 7], from=0, to=2, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
Node 0 state: FOLLOWING
Node 0 final vote: Vote\{leader=1, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=11, predecessors=[5], from=1, to=1, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=15, predecessors=[11], from=1, to=2, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=6, predecessors=[], from=2, to=0, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=17, predecessors=[6, 7], from=0, to=2, leader=1, state=FOLLOWING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=16, predecessors=[6], from=2, to=1, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=19, predecessors=[16, 15], from=1, to=0, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=20, predecessors=[16, 19], from=1, to=1, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=22, predecessors=[16, 20], from=1, to=2, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=21, predecessors=[17, 19, 7], from=0, to=1, leader=1, state=FOLLOWING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=18, predecessors=[16], from=2, to=2, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
Node 1 state: FOLLOWING
Node 1 final vote: Vote\{leader=2, zxid=0, electionEpoch=1, peerEpoch=0}
Node 2 state: LEADING
Node 2 final vote: Vote\{leader=2, zxid=0, electionEpoch=1, peerEpoch=0}

3) There are some nodes that neither following nor leading
NodeCrashEvent\{id=3, nodeId=2}
NodeStartEvent\{id=7, nodeId=2}
MessageEvent\{id=8, predecessors=[7], from=2, to=0, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=9, predecessors=[8, 7], from=2, to=1, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=10, predecessors=[9, 7], from=2, to=2, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=5, predecessors=[], from=1, to=0, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=11, predecessors=[5], from=1, to=1, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=12, predecessors=[11], from=1, to=2, leader=1, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=13, predecessors=[12, 9], from=1, to=0, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=14, predecessors=[9, 13], from=1, to=1, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=15, predecessors=[9, 14], from=1, to=2, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=4, predecessors=[], from=0, to=0, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=16, predecessors=[4], from=0, to=1, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=17, predecessors=[16], from=0, to=2, leader=0, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=18, predecessors=[8, 17], from=0, to=0, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=19, predecessors=[8, 18], from=0, to=1, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
Node 2 state: LEADING
Node 2 final vote: Vote\{leader=2, zxid=0, electionEpoch=1, peerEpoch=0}
Node 1 state: FOLLOWING
Node 1 final vote: Vote\{leader=2, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=20, predecessors=[8, 19], from=0, to=2, leader=2, state=LOOKING, zxid=0, electionEpoch=1, peerEpoch=0}
MessageEvent\{id=21, predecessors=[20, 7], from=2, to=0, leader=2, state=LEADING, zxid=0, electionEpoch=1, peerEpoch=1}
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
44 weeks, 6 days ago 0|z015hc:
ZooKeeper ZOOKEEPER-3335

Improve the usage of Collections

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 26/Mar/19 14:15   09/Apr/19 20:27 09/Apr/19 09:36   3.6.0 server   0 2 0 7200   bq. This class is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue.

https://docs.oracle.com/javase/7/docs/api/java/util/ArrayDeque.html
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
49 weeks, 2 days ago 0|z014fk:
ZooKeeper ZOOKEEPER-3334

./zkCli.sh cannot create the node

Bug Open Major Unresolved Unassigned maoling maoling 26/Mar/19 03:58   03/Dec/19 19:28       scripts   0 3   in the zkCli.sh cannot create a node,and no exceptions have been thrown, but use the java client api can create that path: "/configplatform/12", I saw the same issue previously.

[zk: (CONNECTED) 2] create /configplatform/12
[zk: (CONNECTED) 3] create /configplatform/12
[zk: (CONNECTED) 4] create /configplatform/12
[zk: (CONNECTED) 5] create /configplatform/12
[zk: (CONNECTED) 6] create /configplatform/12
[zk: (CONNECTED) 7] ls /configplatform
[11, 13, 3, 4, 5, 6, 7, 8, 9, 10]
[zk: (CONNECTED) 8] delete /configplatform/12
Node does not exist: /configplatform/12
[zk: (CONNECTED) 9] get /configplatform/12
Node does not exist: /configplatform/12
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
50 weeks, 6 days ago 0|z013lc:
ZooKeeper ZOOKEEPER-3333

Detect if txnlogs and / or snapshots is deleted under a running ZK instance

Improvement Open Major Unresolved Unassigned Norbert Kalmár Norbert Kalmár 25/Mar/19 07:21   28/Mar/19 13:16   3.5.5, 3.4.14   server   0 4   ZK does not notice if txnlogs are deleted from it's dataDir, and it will just keep running, writing txns in the buffer. Than, when ZK is restarted, it will lose all data.

To reproduce:
I run a 3 node ZK ensemble, and deleted dataDir for just one instance, than wrote some data. It turns out, it will not write the transaction to disk. ZK stores everything in memory, until it “feels like” it’s time to persist it on disk. So it doesn’t even notice the file is deleted, and when it tried to flush, I imagine it just fails and keeps it in the buffer.
So anyway, I restarted the instance, it got the snapshot + latest txn logs from the other nodes, as expected it would. It also wrote them in dataDir, so now every node had the dataDir.
So deleting from one node is fine (again, as expected, they will sync after a restart).

Then, I deleted all 3 nodes dataDir under running instances. Until restart, it worked fine (of course I was getting my buffer full, I did not test until the point it got overflowed).
But after restart, I got a fresh new ZK with all my znodes gone.

For starter, I think ZK should detect if the file it is appending is removed.
What should ZK do? At least give a warning log message. The question should it try to create a new file? Or try to get it from other nodes? Or just fail instantly? Restart itself, see if it can sync?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
51 weeks ago 0|z0127s:
ZooKeeper ZOOKEEPER-3332

TxnLogToolkit should print multi transactions readably

Improvement Resolved Major Fixed maoling Toshihiro Suzuki Toshihiro Suzuki 23/Mar/19 13:06   12/Apr/19 23:01 12/Apr/19 13:08   3.6.0     0 4 0 4200   Currently, LogFormatter shows multi transactions like the following and it's not readable:
{code:java}
3/23/19 7:35:21 AM UTC session 0x3699141c4080020 cxid 0x21 zxid 0x1000002d9 multi v{s{1,#000292f726d73746f72652f5a4b524d5374617465526f6f742f524d5f5a4b5f46454e43494e475f4c4f434b000000010001f0005776f726c640006616e796f6e6500001c},s{5,#000312f726d73746f72652f5a4b524d5374617465526f6f742f414d524d546f6b656e5365637265744d616e61676572526f6f7400012a108ffffffe0fffffffdffffff92fffffff15128fffffff5ffffff9a731174ffffffa8ffffff86ffffffb40009},s{2,#000292f726d73746f72652f5a4b524d5374617465526f6f742f524d5f5a4b5f46454e43494e475f4c4f434b}}
{code}
Like delete and setData as the following, LogFormatter should print multi transactions readably:
{code:java}
3/22/19 7:20:48 AM UTC session 0x2699141c3f70022 cxid 0x885 zxid 0x1000002cc delete '/hbase-unsecure/region-in-transition/d6694b5f7ec2c45f6096fe373c8a34bc

3/22/19 7:20:50 AM UTC session 0x2699141c3f70024 cxid 0x47 zxid 0x1000002cd setData '/hbase-unsecure/region-in-transition/a9c6dac76ce74812196667ebc01dad51,#ffffffff0001a726567696f6e7365727665723a313630323035617afffffffa42ffffff94ffffffe81f5042554684123f53595354454d2e434154414c4f472c2c313535333233313233393533352e61396336646163373663653734383132313936363637656263303164616435312e18ffffffe9ffffffa8ffffff98ffffffa2ffffff9a2d2228a1c633132362d6e6f6465342e7371756164726f6e2d6c6162732e636f6d10ffffff947d18ffffffcffffffffbffffff96ffffffa2ffffff9a2d,2
{code}
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 5 days ago 0|z010v4:
ZooKeeper ZOOKEEPER-3331

Automatically add IP authorization for Netty connections

New Feature Resolved Trivial Fixed Unassigned Brian Nixon Brian Nixon 22/Mar/19 18:23   06/May/19 15:51 06/May/19 12:32 3.6.0 3.6.0 server   0 3 0 1800   NIOServerCnxn automatically adds the client's address as an auth token under the "ip" scheme. Extend that functionality to the NettyServerCnxn as well to bring parity to the two approaches. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 3 days ago 0|z010ew:
ZooKeeper ZOOKEEPER-3330

[zookeeper-docs]:the index.md's navigation is broken

Bug Open Major Unresolved Unassigned wenshuai.zhang wenshuai.zhang 22/Mar/19 08:19   23/Mar/19 05:46   3.5.4   documentation   0 2 0 4200   navigation xxx.html file not found. fix to xxx.md 100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
51 weeks, 6 days ago 0|z00znk:
ZooKeeper ZOOKEEPER-3329

ZooKeeper Banner is too large to adapt to the windows or console size

Improvement Resolved Minor Won't Fix maoling maoling maoling 19/Mar/19 04:03   14/Nov/19 05:51 14/Nov/19 05:51         0 1   2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:java.io.tmpdir=/var/folders/kj/092gpj_s2hvdgx77c9ghqdv00000gp/T/
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:java.compiler=<NA>
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:os.name=Mac OS X
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:os.arch=x86_64
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:os.version=10.13.6
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:user.name=wenba
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:user.home=/Users/wenba
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:user.dir=/Users/wenba/workspaces/workspace_zookeeper/zookeeper/zookeeper-server
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:os.memory.free=223MB
2019-03-19 15:30:46,783 [myid:] - INFO [main:Environment@109] - Client environment:os.memory.max=455MB
2019-03-19 15:30:46,784 [myid:] - INFO [main:Environment@109] - Client environment:os.memory.total=245MB
2019-03-19 15:30:46,788 [myid:] - INFO [main:ZooKeeper@871] - Initiating client connection, connectString=127.0.0.1:11222 sessionTimeout=5000 watcher=org.apache.zookeeper.test.ClientBase$CountdownWatcher@7e0ea639
2019-03-19 15:30:46,889 [myid:] - INFO [Thread-0:QuorumPeerConfig@141] - Reading configuration from: /Users/wenba/workspaces/workspace_zookeeper/zookeeper/zookeeper-server/target/surefire/test5571915780702939434.junit.dir/zoo.cfg
2019-03-19 15:30:46,889 [myid:] - INFO [Thread-0:QuorumPeerConfig@396] - clientPort is not set
2019-03-19 15:30:46,889 [myid:] - INFO [Thread-0:QuorumPeerConfig@420] - secureClientPortAddress is 0.0.0.0:11222
2019-03-19 15:30:46,889 [myid:] - INFO [Thread-0:QuorumPeerConfig@427] - observerMasterPort is not set
2019-03-19 15:30:46,889 [myid:] - INFO [Thread-0:QuorumPeerConfig@445] - metricsProvider.className is org.apache.zookeeper.metrics.impl.NullMetricsProvider
2019-03-19 15:30:46,890 [myid:] - INFO [Thread-0:ZooKeeperServerMain@121] - Starting server
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] -
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - ______ _
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - |___ / | |
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - / / ___ ___ | | __ ___ ___ _ __ ___ _ __
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - / / / _ \ / _ \ | |/ / / _ \ / _ \ | '_ \ / _ \ | '__|
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - / /__ | (_) | | (_) | | < | __/ | __/ | |_) | | __/ | |
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - /_____| \___/ \___/ |_|\_\ \___| \___| | .__/ \___| |_|
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - | |
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] - |_|
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:ZookeeperBanner@43] -
2019-03-19 15:30:46,906 [myid:] - INFO [Thread-0:Environment@109] - Server environment:zookeeper
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 2 days ago 0|z00uo0:
ZooKeeper ZOOKEEPER-3328

ZOOKEEPER-3245 misc metrics

Sub-task Resolved Minor Not A Problem Unassigned Jie Huang Jie Huang 18/Mar/19 11:19   19/Dec/19 17:59 27/Mar/19 13:05     metric system   0 3   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
51 weeks, 1 day ago 0|z00tls:
ZooKeeper ZOOKEEPER-3327

ZOOKEEPER-3245 Add unrecoverable error count

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 18/Mar/19 11:18   08/Jul/19 17:30 21/Mar/19 14:15   3.6.0 metric system   0 2 0 4800   100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year ago 0|z00tlk:
ZooKeeper ZOOKEEPER-3326

ZOOKEEPER-3245 Add session/connection related metrics

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 18/Mar/19 11:17   22/Apr/19 18:42 22/Apr/19 13:20   3.6.0 metric system   0 2 0 4200   100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
47 weeks, 3 days ago 0|z00tlc:
ZooKeeper ZOOKEEPER-3325

ZOOKEEPER-3245 Add unavailable time metrics for quorum peers

Sub-task Resolved Minor Later Unassigned Jie Huang Jie Huang 18/Mar/19 11:16   19/Dec/19 18:01 19/Mar/19 12:23     metric system   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 2 days ago 0|z00tkw:
ZooKeeper ZOOKEEPER-3324

ZOOKEEPER-3245 Add read/write metrics for top level znodes

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 18/Mar/19 11:15   12/Apr/19 12:02 12/Apr/19 06:31   3.6.0 metric system   0 2 0 4800   These metrics provide bytes read from each branch under the root and bytes written to each branch under the root. We use top level znodes not only to manage applications that share an ensemble but also to organize data on a dedicated ensemble. These metrics help us to do quota management, ACL management, etc at the top znode level. 100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 6 days ago 0|z00tko:
ZooKeeper ZOOKEEPER-3323

ZOOKEEPER-3245 Add TxnSnapLog metrics

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 18/Mar/19 11:14   08/Jul/19 17:31 21/May/19 17:35   3.6.0 metric system   0 2 0 15600   100% 100% 15600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 1 day ago 0|z00tkg:
ZooKeeper ZOOKEEPER-3322

./zkServer.sh status failed when reconfig don't write the clientPort into the zoo.cfg

Bug Resolved Major Implemented maoling maoling maoling 17/Mar/19 21:52   05/Jun/19 02:04 05/Jun/19 02:04         0 1   look at my zoo.cfg which don't have the clientPort value when using the reconfig.
*cat ../conf/zoo.cfg*
reconfigEnabled=true
dataDir=../../zkdata2
syncLimit=5
dataLogDir=../../zkdataLog2
initLimit=10
tickTime=2000
dynamicConfigFile=/data/software/zookeeper/zookeeper-test2/conf/zoo.cfg.dynamic.1f00000000

but look at the cmd:"./zkServer.sh status",it needs this clientPort value
STAT=`"$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" "-Dzookeeper.log.file=${ZOO_LOG_FILE}" \
-cp "$CLASSPATH" $JVMFLAGS org.apache.zookeeper.client.FourLetterWordMain \
$clientPortAddress $clientPort srvr 2> /dev/null \

otw, ./zkServer.sh status will fail.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 1 day ago 0|z00sog:
ZooKeeper ZOOKEEPER-3321

ZOOKEEPER-3245 Add metrics for Leader

Sub-task Resolved Major Fixed Jie Huang Jie Huang Jie Huang 17/Mar/19 12:44   08/Jul/19 17:31 05/Jun/19 17:57   3.6.0 metric system   0 2 0 11400   100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks ago 0|z00si8:
ZooKeeper ZOOKEEPER-3320

Leader election port stop listen when hostname unresolvable for some time

Bug Closed Major Fixed Igor Skokov Igor Skokov Igor Skokov 17/Mar/19 04:21   16/Oct/19 14:59 13/Aug/19 07:36 3.4.10, 3.5.4 3.6.0, 3.5.6 leaderElection   0 5 0 39000   When trying to run Zookeeper 3.5.4 cluster on Kubernetes, I found out that in some circumstances Zookeeper node stop listening on leader election port. This cause unavailability of ZK cluster.
Zookeeper deployed  as StatefulSet in term of Kubernetes and has following dynamic configuration:

{code:java}
zookeeper-0.zookeeper:2182:2183:participant;2181
zookeeper-1.zookeeper:2182:2183:participant;2181
zookeeper-2.zookeeper:2182:2183:participant;2181
{code}


Bind address contains DNS name which generated by Kubernetes for each StatefulSet pod.
These DNS names will become resolvable after container start, but with some delay. That delay cause stopping of leader election port listener in QuorumCnxManager.Listener class.
Error happens in QuorumCnxManager.Listener "run" method, it tries to bind leader election port to hostname which not resolvable at this moment. Retry count is hard-coded and equals to 3(with backoff of 1 sec).

Zookeeper server log contains following errors:

{code:java}
2019-03-17 07:56:04,844 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1230] - Unexpected exception
java.net.SocketException: Unresolved address
at java.base/java.net.ServerSocket.bind(ServerSocket.java:374)
at java.base/java.net.ServerSocket.bind(ServerSocket.java:335)
at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:241)
at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226)
2019-03-17 07:56:04,844 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1261] - PeerState set to LOOKING
2019-03-17 07:56:04,845 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1136] - LOOKING
2019-03-17 07:56:04,845 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@893] - New election. My id = 1, proposed zxid=0x0
2019-03-17 07:56:04,846 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@687] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0xf (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-03-17 07:56:04,979 [myid:1] - INFO [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@892] - Leaving listener
2019-03-17 07:56:04,979 [myid:1] - ERROR [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@894] - As I'm leaving the listener thread, I won't be able to participate in leader election any longer: zookeeper-0.zookeeper:2183
{code}

This error happens on most nodes on cluster start and Zookeeper is unable to form quorum. This will leave cluster in unusable state.
As I can see, error present on branches 3.4 and 3.5.
I think, this error can be fixed by configurable number of retries(instead of hard-coded value of 3).
Other way to fix this is removing of max retries at all. Currently, ZK server only stop leader election listener and continue to serve on other ports. Maybe, if leader election halts, we should abort process.
100% 100% 39000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
31 weeks, 2 days ago 0|z00s9c:
ZooKeeper ZOOKEEPER-3319

ZOOKEEPER-3245 Add metrics for follower and observer

Sub-task Resolved Major Fixed Jie Huang Jie Huang Jie Huang 17/Mar/19 01:32   02/May/19 22:15 02/May/19 18:44   3.6.0 metric system   0 2 0 9000   100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 6 days ago 0|z00s80:
ZooKeeper ZOOKEEPER-3318

[CLI way]Add a complete backup mechanism for zookeeper internal

New Feature In Progress Major Unresolved maoling maoling maoling 16/Mar/19 03:10   08/Aug/19 04:26       other   1 3 0 3600   We already had some workaround ways for the backup, e.g:
scenario 1: just write a cron shell to copy the snapshots periodically.
scenario 2: use the observer as the role of backup, then write the snapshots to distributed file system. (e.g HDFS)

this issue is aiming to implement a complete backup mechanism for zookeeper internal:
the init propose:
1. for realtime backup.
write a new CLI:snapshot
1.1
[zk: 127.0.0.1:2180(CONNECTED) 0] snapshot backupDataDir
[zk: 127.0.0.1:2180(CONNECTED) 1] snapshot
***************************************************************************************************************
1.2
if no parameter, the default backupDataDir is the dataDir. the format of the backup-snapshot is just like: snapshot.f9f800002834 which is the same as the original one.
when recovering,moving the snapshot.f9f800002834 to the dataDir, then restart the ensemble.
1.3
don't worry about exposing the takeSnap() api to the client.Look at this two references:
https://github.com/etcd-io/etcd/blob/master/clientv3/snapshot/v3_snapshot.go
https://github.com/xetorthio/jedis/blob/master/src/main/java/redis/clients/jedis/commands/BasicCommands.java#L68
2. for no-realtime backup.
2.1
write a new tool/shell: zkBackup.sh which is the reverse proces of the zkCleanup.sh for no-realtime backup.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
51 weeks ago 0|z00ruw:
ZooKeeper ZOOKEEPER-3317

dynamic file for reconfig should support the relative path

Improvement Resolved Minor Invalid Unassigned maoling maoling 16/Mar/19 03:09   27/Aug/19 06:48 27/Aug/19 06:48     server   0 2   dynamic file for reconfig should support the relative path,just like this:
*dynamicConfigFile=../zoo_replicated5.cfg.dynamic*
follow an example of *dataDir*:if use the relative path,log the warning info.


2019-03-14 11:02:39,028 [myid:] - INFO [main:QuorumPeerConfig@141] - Reading configuration from: /data/software/zookeeper/zookeeper-test2/bin/../conf/zoo.cfg
2019-03-14 11:02:39,037 [myid:] - WARN [main:VerifyingFileFactory@59] - ../../zkdata2 is relative. Prepend ./ to indicate that you're sure!
2019-03-14 11:02:39,037 [myid:] - WARN [main:VerifyingFileFactory@59] - ../../zkdataLog2 is relative. Prepend ./ to indicate that you're sure!
2019-03-14 11:02:39,048 [myid:] - INFO [main:QuorumPeerConfig@406] - clientPortAddress is 0.0.0.0:22181
2019-03-14 11:02:39,048 [myid:] - INFO [main:QuorumPeerConfig@410] - secureClientPort is not set
2019-03-14 11:02:39,048 [myid:] - INFO [main:QuorumPeerConfig@427] - observerMasterPort is not set
2019-03-14 11:02:39,048 [myid:] - INFO [main:QuorumPeerConfig@445] - metricsProvider.className is org.apache.zookeeper.metrics.impl.NullMetricsProvider
2019-03-14 11:02:39,048 [myid:] - ERROR [main:QuorumPeerMain@94] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing ../zoo_replicated2.cfg.dynamic
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:187)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:118)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:87)
Caused by: java.io.FileNotFoundException: ../zoo_replicated2.cfg.dynamic (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:168)
... 2 more
Invalid config, exiting abnormally
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks, 2 days ago 0|z00ruo:
ZooKeeper ZOOKEEPER-3316

Remove unused code in SyncRequestProcessor

Bug Resolved Minor Invalid Unassigned Jie Huang Jie Huang 15/Mar/19 14:18   19/Dec/19 17:59 04/Apr/19 19:49     server   0 1 0 10200   to make spotbugs happy 100% 100% 10200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 days ago 0|z00rcw:
ZooKeeper ZOOKEEPER-3315

Exceptions in callbacks should be handlable by the application

Improvement Open Major Unresolved Unassigned Steven McDonald Steven McDonald 15/Mar/19 12:28   22/Mar/19 06:32           0 2   Hi,

In [KAFKA-7898|https://issues.apache.org/jira/browse/KAFKA-7898], a {{NullPointerException}} in a {{MultiCallback}} caused a Kafka cluster to become unhealthy in such a way that manual intervention was needed to recover. The cause of this particular {{NullPointerException}} is fixed in Kafka 2.2.x (with a proposed documentation update in [ZOOKEEPER-3314|https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3314]), but I am interested in improving the resiliency of Kafka (and by extension the Zookeeper client library) against such bugs.

Quoting the stack trace from KAFKA-7898:

{code}
[2019-02-05 14:28:12,525] ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
java.lang.NullPointerException
at kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
{code}

The "caught unexpected throwable" message comes from [within the Zookeeper client library|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/ClientCnxn.java#L641]. I think that try/catch is pointless, because removing it causes the message to instead be logged [here|https://github.com/apache/zookeeper/blob/release-3.4.13/src/java/main/org/apache/zookeeper/server/ZooKeeperThread.java#L60], with no discernable change in behaviour otherwise. Explicitly exiting the {{EventThread}} when this happens does not help (I don't think it gets restarted).

This is especially problematic with distributed applications, since they are generally designed to tolerate the loss of a node, so it is preferable to have the application be allowed to terminate itself rather than risk inconsistent state.

I am attaching a simple Zookeeper client which does nothing except throw a {{NullPointerException}} as soon as it receives a callback, to illustrate the problem. Running this results in:

{code}
232 [main-EventThread] ERROR org.apache.zookeeper.ClientCnxn - Error while calling watcher
java.lang.NullPointerException
at ExceptionTest.process(ExceptionTest.java:31)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:539)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:514)
{code}

This comes from [here|https://github.com/apache/zookeeper/blob/7256d01a26412cd35a46edab6de9ac8c5adf5bb3/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L541], which simply logs the occurrence but provides no way for my application to handle the failure.

I suspect the best approach here might be to allow the application to register a callback to notify it of unhandlable exceptions within the Zookeeper library, since Zookeeper has no way of knowing what approach makes sense for the application. Of course, this is already technically possible in this case by having the application catch all exceptions in every callback, but that doesn't seem very practical.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
51 weeks, 6 days ago 0|z00r6g:
ZooKeeper ZOOKEEPER-3314

Document the possibility of MultiCallback receiving a null pointer

Improvement Resolved Trivial Fixed Steven McDonald Steven McDonald Steven McDonald 15/Mar/19 07:38   09/Apr/19 20:27 09/Apr/19 08:37 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.4.11, 3.5.4, 3.4.12, 3.4.13 3.6.0     0 3 0 1800   A {{MultiCallback}} can receive a null pointer on failure, rather than a list of {{org.apache.zookeeper.OpResult.ErrorResult}} as documented. This is evident from [the implementation|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L689].

This causes NullPointerExceptions in Kafka 2.1.x (see [KAFKA-7898|https://issues.apache.org/jira/browse/KAFKA-7898]). Kafka 2.0.x does not use the async multi interface, and Kafka 2.2.x handles the null pointer case.

However, this is enough of a hazard that it should be documented. I have a patch for that which I will try to attach in a moment (JIRA won't allow me to attach it now for some reason).
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Patch
49 weeks, 2 days ago 0|z00qrk:
ZooKeeper ZOOKEEPER-3313

ZOOKEEPER-3245 Upgrade a few metrics to percentile counter

Sub-task Resolved Minor Not A Problem Unassigned Jie Huang Jie Huang 14/Mar/19 19:07   19/Dec/19 17:59 17/Mar/19 01:31     metric system   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 days ago 0|z00q3c:
ZooKeeper ZOOKEEPER-3312

Upgrade Jetty to 9.4.15.v20190215

Improvement Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 14/Mar/19 02:33   20/May/19 13:51 14/Mar/19 11:52 3.5.4, 3.6.0 3.6.0, 3.5.5 security, server   0 3 0 4200   100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week ago
Reviewed
0|z00p08:
ZooKeeper ZOOKEEPER-3311

Allow a delay to the transaction log flush

New Feature Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 13/Mar/19 20:04   22/May/19 13:16 22/May/19 05:39 3.6.0 3.6.0 server   0 3 0 7200   The SyncRequestProcessor flushes writes to disk either when 1000 writes are pending to be flushed or when the processor fails to retrieve another write from its incoming queue. The "flush when queue empty" condition operates poorly under many workloads as it can quickly degrade into flushing after every write -- losing all benefits of batching and leading to a continuous stream of flushes + fsyncs which overwhelm the underlying disk.
 
A configurable flush delay would ensure flushes do not happen more frequently than once every X milliseconds. This can be used in-place of or jointly with batch size triggered flushes.
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 1 day ago 0|z00opc:
ZooKeeper ZOOKEEPER-3310

ZOOKEEPER-3245 Add metrics for prep processor

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 13/Mar/19 01:38   09/May/19 22:18 17/Apr/19 05:06   3.6.0 metric system   0 2 0 22200   100% 100% 22200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 1 day ago 0|z00n54:
ZooKeeper ZOOKEEPER-3309

ZOOKEEPER-3245 Add sync processor metrics

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 12/Mar/19 12:27   08/Jul/19 17:32 04/Jun/19 18:47   3.6.0 metric system   0 2 0 25200   100% 100% 25200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
41 weeks, 2 days ago 0|z00mco:
ZooKeeper ZOOKEEPER-3308

does word tag really need in Jute

Improvement Resolved Trivial Not A Problem Unassigned ougwen1235 ougwen1235 12/Mar/19 05:28   12/Mar/19 07:38 12/Mar/19 07:38 3.5.0   jute   0 1   public interface Record {
public void serialize(OutputArchive archive, String tag)
throws IOException;
public void deserialize(InputArchive archive, String tag)
throws IOException;
}

As above, methods in interface Record, OutputArchive, InputArchive use word tag, but classes who implement these interfaces don't use tag at all. Why do we need it?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 2 days ago 0|z00lnc:
ZooKeeper ZOOKEEPER-3307

Add DEBUG logs in zoo_* c client lib

Improvement Open Major Unresolved Unassigned prashant D prashant D 12/Mar/19 02:43   12/Mar/19 02:43           0 1   Adding DEBUG logs in zoo_* c client lib will be useful in debugging.

apps some time get blocked on zookeeper call for longer time.

since we don't have DEBUG logs in lib, it is difficult to proove that zoo_* calls are the one where it got blocked.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 2 days ago 0|z00lds:
ZooKeeper ZOOKEEPER-3306

Node may not accessible due the the inconsistent ACL reference map after SNAP sync

Bug Resolved Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 11/Mar/19 13:16   29/Apr/19 18:50 29/Apr/19 10:49 3.5.4, 3.6.0, 3.4.13 3.6.0 server   0 2 0 4800   This is a new bug we found on production.

ZooKeeper uses ACL reference id and count to save the space in snapshot. During fuzzy snapshot sync, the reference count may not be updated correctly in case like the znode is already exist.

When ACL reference count reaches 0, it will be deleted from the system, but actually there might be other nodes still using it. And when visiting a node with the deleted ACL id, it will be rejected because it doesn't exist anymore.

Here is the detailed flow for one of the scenario here:
# Server A starts to have snap sync with leader
# After serializing the ACL map to Server A, there is a txn T1 to create a node N1 with new ACL_1 which was not exist in ACL map
# On leader, after this txn, the ACL map will be ID1 -> (ACL_1, COUNT: 1), and data tree N1 -> ID1
# On server A, it will be empty ACL map, and N1 -> ID1 in fuzzy snapshot
# When replaying the txn T1, it will skip at the beginning since the node is already exist, which leaves an empty ACL map, and N1 is referencing to a non-exist ACL ID1
# Node N1 will be not accessible because the ACL not exist, and if it became leader later then all the write requests will be rejected as well with marshalling error.

We're still working on the fix, suggestions are welcome.
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
46 weeks, 3 days ago 0|z00kmg:
ZooKeeper ZOOKEEPER-3305

ZOOKEEPER-3245 Add Quorum Packet metrics

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 10/Mar/19 18:48   08/Jul/19 17:32 06/May/19 11:49   3.6.0 metric system   0 2 0 12600   100% 100% 12600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 3 days ago 0|z00jk0:
ZooKeeper ZOOKEEPER-3304

Maven build of "loggraph" is broken on branch-3.4

Bug Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 10/Mar/19 04:47   14/Mar/19 11:58 14/Mar/19 11:57 3.4.13 3.4.15 build, contrib   0 1 0 7800   Loggraph uses Jetty and dependency is missing in branch-3.4. 100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week ago 0|z00jao:
ZooKeeper ZOOKEEPER-3303

ZooKeeper Perl client zkperl doesn't compile on newer RHEL systems ie. Fedora

Bug Open Blocker Unresolved Unassigned Hari Sekhon Hari Sekhon 09/Mar/19 10:29   03/Feb/20 06:41   3.4.8, 3.4.12, 3.4.13   c client, contrib   0 3   Fedora 29 in docker ZooKeeper Perl client zkperl fails to compile on Fedora 29 (compiles ok on CentOS 7 though). I cannot build the project to get the zkperl dependencies to run on Fedora as it is. This happens on various versions of ZooKeeper 3.4.x
{code:java}
# perl Makefile.PL --zookeeper-include=/usr/local/include --zookeeper-lib=/usr/local/lib
Generating a Unix-style Makefile
Writing Makefile for Net::ZooKeeper
Writing MYMETA.yml and MYMETA.json

# make
Skip blib/lib/Net/ZooKeeper.pm (unchanged)
Running Mkbootstrap for ZooKeeper ()
chmod 644 "ZooKeeper.bs"
"/usr/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- ZooKeeper.bs blib/arch/auto/Net/ZooKeeper/ZooKeeper.bs 644
gcc -c -I/usr/local/include -I. -D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g -DVERSION=\"0.36\" -DXS_VERSION=\"0.36\" -fPIC "-I/usr/lib64/perl5/CORE" ZooKeeper.c
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_acl_constant’:
ZooKeeper.c:784:7: warning: unused variable ‘RETVAL’ [-Wunused-variable]
AV * RETVAL;
^~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_CLONE’:
ZooKeeper.c:1089:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_CLONE_SKIP’:
ZooKeeper.c:1109:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_TIEHASH’:
ZooKeeper.c:1129:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_UNTIE’:
ZooKeeper.c:1151:5: warning: unused variable ‘ref_count’ [-Wunused-variable]
IV ref_count = (IV)SvIV(ST(1))
^~~~~~~~~
ZooKeeper.c:1150:17: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_SCALAR’:
ZooKeeper.c:1281:17: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_DELETE’:
ZooKeeper.c:1528:7: warning: unused variable ‘attr_key’ [-Wunused-variable]
SV * attr_key = ST(1)
^~~~~~~~
ZooKeeper.c:1527:17: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper_CLEAR’:
ZooKeeper.c:1561:17: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper attr_hash;
^~~~~~~~~
ZooKeeper.xs: In function ‘XS_Net__ZooKeeper_add_auth’:
ZooKeeper.xs:1206:30: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 3 has type ‘STRLEN’ {aka ‘long unsigned int’} [-Wformat=]
Perl_croak(aTHX_ "invalid certificate length: %u", cert_len);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~
ZooKeeper.xs: In function ‘XS_Net__ZooKeeper_create’:
ZooKeeper.xs:1286:30: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 3 has type ‘STRLEN’ {aka ‘long unsigned int’} [-Wformat=]
Perl_croak(aTHX_ "invalid data length: %u", buf_len);
^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~
ZooKeeper.xs:1321:21: error: format not a string literal and no format arguments [-Werror=format-security]
Perl_croak(aTHX_ err);
^~~~~~~~~~
ZooKeeper.xs: In function ‘XS_Net__ZooKeeper_set’:
ZooKeeper.xs:1760:30: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 3 has type ‘STRLEN’ {aka ‘long unsigned int’} [-Wformat=]
Perl_croak(aTHX_ "invalid data length: %u", buf_len);
^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~
ZooKeeper.xs: In function ‘XS_Net__ZooKeeper_set_acl’:
ZooKeeper.xs:1923:13: error: format not a string literal and no format arguments [-Werror=format-security]
Perl_croak(aTHX_ err);
^~~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_CLONE’:
ZooKeeper.c:2871:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_CLONE_SKIP’:
ZooKeeper.c:2891:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_TIEHASH’:
ZooKeeper.c:2911:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_UNTIE’:
ZooKeeper.c:2933:5: warning: unused variable ‘ref_count’ [-Wunused-variable]
IV ref_count = (IV)SvIV(ST(1))
^~~~~~~~~
ZooKeeper.c:2932:23: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Stat attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_SCALAR’:
ZooKeeper.c:3065:23: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Stat attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_STORE’:
ZooKeeper.c:3167:7: warning: unused variable ‘attr_val’ [-Wunused-variable]
SV * attr_val = ST(2)
^~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_DELETE’:
ZooKeeper.c:3271:7: warning: unused variable ‘attr_key’ [-Wunused-variable]
SV * attr_key = ST(1)
^~~~~~~~
ZooKeeper.c:3270:23: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Stat attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Stat_CLEAR’:
ZooKeeper.c:3304:23: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Stat attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Watch_CLONE’:
ZooKeeper.c:3405:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Watch_CLONE_SKIP’:
ZooKeeper.c:3425:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Watch_TIEHASH’:
ZooKeeper.c:3445:9: warning: unused variable ‘package’ [-Wunused-variable]
char * package = (char *)SvPV_nolen(ST(0))
^~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Watch_UNTIE’:
ZooKeeper.c:3467:5: warning: unused variable ‘ref_count’ [-Wunused-variable]
IV ref_count = (IV)SvIV(ST(1))
^~~~~~~~~
ZooKeeper.c:3466:24: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Watch attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Watch_SCALAR’:
ZooKeeper.c:3599:24: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Watch attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Watch_DELETE’:
ZooKeeper.c:3803:7: warning: unused variable ‘attr_key’ [-Wunused-variable]
SV * attr_key = ST(1)
^~~~~~~~
ZooKeeper.c:3802:24: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Watch attr_hash;
^~~~~~~~~
ZooKeeper.c: In function ‘XS_Net__ZooKeeper__Watch_CLEAR’:
ZooKeeper.c:3836:24: warning: variable ‘attr_hash’ set but not used [-Wunused-but-set-variable]
Net__ZooKeeper__Watch attr_hash;
^~~~~~~~~
cc1: some warnings being treated as errors
make: *** [Makefile:335: ZooKeeper.o] Error 1
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
6 weeks, 3 days ago 0|z00j20:
ZooKeeper ZOOKEEPER-3302

ZooKeeper C client does not compile on Fedora 29

Wish Resolved Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 09/Mar/19 08:12   09/Apr/19 20:27 09/Apr/19 04:08 3.6.0 3.6.0 c client   1 3 0 7200   I cannot build current master (git sha 372e713a9d2d9264417313e5d68e9437ffddd0f5)  with Fedora 29

 
{noformat}
gcc --version
gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2)
{noformat}
 

This is the error:
{code:java}
    [exec] gcc -DHAVE_CONFIG_H -I. -I/home/eolivelli/dev/zookeeper/zookeeper-client/zookeeper-client-c  -I/home/eolivelli/dev/zookeeper/zookeeper-client/zookeeper-client-c/include -I/home/eolivelli/dev/zookeeper/zookeeper-client/zookeeper-client-c/tests -I/home/eolivelli/dev/zookeeper/zookeeper-client/zookeeper-client-c/generated   -Wall -Werror -Wdeclaration-after-statement -fprofile-arcs -ftest-coverage -g -O2 -D_GNU_SOURCE -MT cli.o -MD -MP -MF .deps/cli.Tpo -c -o cli.o `test -f 'src/cli.c' || echo '/home/eolivelli/dev/zookeeper/zookeeper-client/zookeeper-client-c/'`src/cli.c
     [exec] /home/eolivelli/dev/zookeeper/zookeeper-client/zookeeper-client-c/src/cli.c: In function ‘main’:
     [exec] /home/eolivelli/dev/zookeeper/zookeeper-client/zookeeper-client-c/src/cli.c:689:9: error: ‘strncpy’ specified bound 1024 equals destination size [-Werror=stringop-truncation]
     [exec]          strncpy(cmd, argv[2]+4, sizeof(cmd));
     [exec]          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     [exec] cc1: all warnings being treated as errors
     [exec] make: *** [Makefile:1155: cli.o] Error 1{code}
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
49 weeks, 2 days ago 0|z00izk:
ZooKeeper ZOOKEEPER-3301

Enforce the quota limit

New Feature In Progress Major Unresolved maoling maoling maoling 09/Mar/19 06:19   24/Feb/20 22:28     3.7.0     0 2 0 20400   We need a complete quota feature, not just the printing the warning logs which is a bit chicken ribs.

[zk: localhost:2181(CONNECTED) 18] setquota -n 2 /quota_test
[zk: localhost:2181(CONNECTED) 19] create /quota_test/child_1
Created /quota_test/child_1
[zk: localhost:2181(CONNECTED) 20] create /quota_test/child_2
Created /quota_test/child_2
[zk: localhost:2181(CONNECTED) 21] create /quota_test/child_3
Created /quota_test/child_3

look at the following logs:
2019-03-07 11:22:36,680 [myid:1] - WARN [SyncThread:0:DataTree@374] - Quota exceeded: /quota_test count=3 limit=2
2019-03-07 11:22:41,861 [myid:1] - WARN [SyncThread:0:DataTree@374] - Quota exceeded: /quota_test count=4 limit=2
100% 100% 20400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
45 weeks, 1 day ago 0|z00iyg:
ZooKeeper ZOOKEEPER-3300

ZOOKEEPER-3297 CLI:"history" should show the recent 10 cmds, not the 11

Sub-task Open Trivial Unresolved Rabi Kumar K C maoling maoling 09/Mar/19 06:08   14/Jan/20 11:22           0 1 0 4200   [zk: localhost:2181(CONNECTED) 13] history
3 - setAcl /ac auth:user1:password1:cdrwa
4 - getAcl /testwatch
5 - getAcl /testwatch
6 - getAcl /acl_digest_test
7 - setAcl /acl_digest_test digest:user1:+owfoSBn/am19roBPzR1/MfCblE=:crwad
8 - get /acl_digest_test
9 - getAcl /acl_digest_test
10 - addauth digest user1:12345
11 - getAcl /acl_digest_test
12 - get /acl_digest_test
13 - history
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 5 days ago 0|z00iy8:
ZooKeeper ZOOKEEPER-3299

"setquota -n|-b val path" need a brackets

Bug Resolved Trivial Not A Bug Unassigned maoling maoling 09/Mar/19 04:10   11/May/19 07:02 11/May/19 07:02         0 1 0 3600   What we want is "setquota [-n|-b] val path" 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 5 days ago 0|z00ivk:
ZooKeeper ZOOKEEPER-3298

ZOOKEEPER-3297 add a new CLI: "ll" to show the nodes vertically

Sub-task Open Major Unresolved maoling maoling maoling 09/Mar/19 04:00   19/Apr/19 08:21           0 1   the CLI "ls" is used to show the nodes horizontally, we also need a new cli "ll" to show the nodes vertically, just like the linux os.

[zk: 127.0.0.1:22181(CONNECTED) 3] ls /
[a, admin, b, b1, barrier, brokers, cluster, consumers, controller_epoch, hbase, isr_change_notification]
e.g.
[zk: 127.0.0.1:22181(CONNECTED) 4] ll /
a
admin
b
b1
barrier
brokers
cluster
consumers
controller_epoch
hbase
isr_change_notification
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 5 days ago 0|z00ivc:
ZooKeeper ZOOKEEPER-3297

A list of improvement and bug fixs for the CLI of ZooKeeper

Task Open Major Unresolved Unassigned maoling maoling 09/Mar/19 03:50   09/Mar/19 04:14   3.6.0       0 1   ZOOKEEPER-3298, ZOOKEEPER-3300 A list of improvement and bug fixs for the CLI of ZooKeeper 100% 4200 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 5 days ago 0|z00iv4:
ZooKeeper ZOOKEEPER-3296

Cannot join quorum due to Quorum SSLSocket connection not closed explicitly when there is handshake issue

Bug Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 07/Mar/19 13:07   14/Jun/19 08:09 14/Jun/19 03:09 3.5.4, 3.6.0 3.6.0 server   0 2 0 8400   Recently, on prod ensembles, we saw some peers failed to connect to others due to timed out when connecting to the other's leader election port. This was triggered by a network incident with lots of packet loss.

After investigation, we found it's because we doesn't close the socket explicitly when it timed out during ssl handshake in QuorumCnxManager.connectOne.

The quorum connection manager is handling connections sequentially with a default listen backlog queue size 50, during the network loss, there are socket read timed out, which is syncLimit * tickTime, and almost all the following connect requests in the backlog queue will timed out from the other side before it's being processed. Those timed out learners will try to connect to a different server, and leaves the connect requests on server side without sending the close_notify packet. The server is slowly consuming from these queue with syncLimit * tickTime timeout for each of those requests which haven't sent notify_close packet. Any new connect requests will be queued up again when there is spot in the listen backlog queue, but timed out before the server handles it, and it can never successfully finish any new connection, so it failed to join the quorum. And the peers are leaking FD because all those connections are in CLOSE-WAIT state.
 
Restarting the servers to drain the listen backlog queue mitigated the issue.

Here are the steps to manually reproduce the issue:
# issuing two telnet connect to server A in the quorum without sending any packet
# stop all other servers
# start those again
# server A read timed out from those telnet connect request one by one and it cannot join the quorum anymore
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks, 6 days ago 0|z00gu8:
ZooKeeper ZOOKEEPER-3295

bin/zkEnv.sh no need to check "$ZOOBINDIR"/../zookeeper-server/src/main/resources/lib/*.jar

Improvement Open Trivial Unresolved Unassigned liwenjie liwenjie 06/Mar/19 22:40   05/Feb/20 07:16   3.5.4 3.5.8 build   0 1   1, in formal build, no "$ZOOBINDIR"/../zookeeper-server, let alone "$ZOOBINDIR"/../zookeeper-server/src/main/resources/lib/*.jar

2, in source, no jar under "$ZOOBINDIR"/../zookeeper-server/src/main/resources/lib/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 2 weeks ago 0|z00fpc:
ZooKeeper ZOOKEEPER-3294

bin/zkServer.sh no neeed to check "$ZOOBIN/../libexec/zkEnv.sh"

Improvement Open Trivial Unresolved Unassigned liwenjie liwenjie 06/Mar/19 22:19   11/Mar/19 16:22   3.5.4   build   0 1 0 1800   There is no "$ZOOBIN/../libexec", let alone "$ZOOBIN/../libexec/zkEnv.sh".

I think, libexec is an artifact from HADOOP ant build.

 

#bin/zkCleanup.sh, bin/zkCli.sh, bin/zkServer-initialize.sh, bin/zkServer.sh, bin/zkTxnLogToolkit.sh all has the check for "$ZOOBIN/../libexec/zkEnv.sh"

 
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 2 weeks ago 0|z00fow:
ZooKeeper ZOOKEEPER-3293

ZooKeeper fails to compile on newer RHEL systems ie. Fedora

Bug Open Blocker Unresolved Unassigned Hari Sekhon Hari Sekhon 05/Mar/19 19:20   27/Feb/20 07:32   3.4.8, 3.4.12, 3.4.13       0 2   Fedora 29 in docker ZooKeeper fails to compile on Fedora 29 (compiles ok on CentOS 7 though). I cannot build the project to get the zkperl dependencies to run on Fedora as it is. This happens on various versions of ZooKeeper 3.4.x
{code:java}
cd zookeeper-3.4.8/src/c
./configure
make
make all-am
make[1]: Entering directory '/github/nagios-plugins/zookeeper-3.4.13/src/c'
/bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/zookeeper.o
src/zookeeper.c: In function ‘format_endpoint_info’:
src/zookeeper.c:3506:21: error: ‘%d’ directive writing between 1 and 5 bytes into a region of size between 0 and 127 [-Werror=format-overflow=]
sprintf(buf,"%s:%d",addrstr,ntohs(port));
^~
src/zookeeper.c:3506:17: note: directive argument in the range [0, 65535]
sprintf(buf,"%s:%d",addrstr,ntohs(port));
^~~~~~~
src/zookeeper.c:3506:5: note: ‘sprintf’ output between 3 and 134 bytes into a destination of size 128
sprintf(buf,"%s:%d",addrstr,ntohs(port));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [Makefile:955: zookeeper.lo] Error 1
make[1]: Leaving directory '/github/nagios-plugins/zookeeper-3.4.13/src/c'
make: *** [Makefile:631: all] Error 2
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 5 days ago 0|z00dfc:
ZooKeeper ZOOKEEPER-3292

ZooKeeper C Client for Windows: should include winports.h

Bug Open Blocker Unresolved Unassigned David Vujic David Vujic 27/Feb/19 10:11   27/Feb/19 10:11   3.4.13   c client   0 1   Windows 10

CMake

ZooKeeper 3.4.13 
When building the C client on Windows with CMake:

cmake -DWANT_SYNCAPI=OFF -DCMAKE_GENERATOR_PLATFORM=x64

 

With this input, the header file winports.h will not be added in these files:

*zk_log.c*

*zk_adaptor.h*

Also, I think winports.h should be added to *zookeeper.c*

 

Without winports.h compiling will fail on Windows. Errors are about strtok_r and localtime_r - the Windows mappings in winports.h are missing. 

I am guessing that other important includes are missing too (like Windows Sockets).

 

One solution could be to extract the winports.h include out from the THREADED preprocessor, to a separate one:

#ifdef WIN32

#include "winport.h"

#endif
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 1 day ago 0|z00568:
ZooKeeper ZOOKEEPER-3291

improve error message when JAVA_HOME is set to the wrong value

Improvement Resolved Minor Fixed Unassigned Mogens Heller Grabe Mogens Heller Grabe 26/Feb/19 03:10   15/Mar/19 04:04 15/Mar/19 01:35 3.4.12 3.6.0 scripts   0 2 0 1200   Windows This is small (Windows-based) developer usability improvement.

When the {{JAVA_HOME}} environment variable is set, but the value is wrong (so that {{JAVA_HOME}} + {{/bin/java.exe}} does not point correctly to {{java.exe}}), the startup script will simply fail with the message

{{Error: JAVA_HOME is incorrectly set.}}

which is a bummer. 😞

With this tiny change, the error message will be much friendlier:

{{Error: JAVA_HOME is incorrectly set: C:\Program Files\Java\jre1.8.0_201\bin}}
{{Expected to find java.exe here: C:\Program Files\Java\jre1.8.0_201\bin\bin\java.exe}}

(in this case showing a situation where one has inadvertently included {{/bin}} in the {{JAVA_HOME}} environment variable).

This will also give a nicer error message in situations, where the JRE has been updated, and the one pointed to by {{JAVA_HOME}} has been uninstalled.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 days ago 0|z00348:
ZooKeeper ZOOKEEPER-3290

Throw detailed KeeperException when a transaction failed

Improvement In Progress Major Unresolved Zili Chen Zili Chen Zili Chen 26/Feb/19 03:10   19/Sep/19 20:50   3.5.4, 3.4.13 4.0.0 server   0 2 0 22800   Assume we execute the follow statements
{code:java}
ZooKeeper zk = ...;
zk.multi(Arrays.asList(
Op.check(path1, -1),
Op.delete(path2, -1)));
{code}

If path1 or path2 didn't exist, we got an exception {{KeeperException.NoNodeException}} without which of them doesn't exist.

The reason is when we executed {{PrepRequestProccessor#pRequest}} in {{PrepRequestProccessor#L804}}, it processed {{KeeperException.NoNodeException}} which contained path info.

However, we generated {{ErrorTxn}} which only contains {{err}} field represented error code and lost path info. Maybe a reasonable resolution is extend {{ErrorTxn}} to contain path info or a general {{data}} byte array.
100% 100% 22800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 2 days ago 0|z00340:
ZooKeeper ZOOKEEPER-3289

Throw KeeperException with path in DataTree operations

Improvement Resolved Major Not A Problem Unassigned Zili Chen Zili Chen 25/Feb/19 00:19   02/Apr/19 06:35 26/Feb/19 03:12 3.5.4, 3.4.13   server   0 2   Currently, if ZooKeeper delete a znode that does not exist. It throws a {{KeeperException.NoNodeException}} without path message. It causes difficulty when user debug with ZooKeeper. For example,

Assume we try to do a transaction(with Curator encapsulation)

{code:java}
client.inTransaction()
.check().forPath(path1).and()
.delete().forPath(path2).and()
.commit()
{code}


if the statement throw an exception {{KeeperException.NoNodeException}} without path information, we can hardly know that it failed at {{check}} or {{delete}}.

Thus I propose throws KeeperException with path in DataTree operations. We can achieve this without burden.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 2 days ago 0|z00188:
ZooKeeper ZOOKEEPER-3288

ZOOKEEPER-3282 add a new doc:ZookeeperCLI.md

Sub-task Resolved Major Fixed maoling maoling maoling 24/Feb/19 01:09   14/Jun/19 22:11 14/Jun/19 17:21   3.6.0 documentation   0 2 0 9000      Write Zookeeper CLI[3.6], which includes the:

 - about how to use the zk command line interface [./zkCli.sh]

   e.g ls /; get ; rmr;create -e -p etc.......

 - look at an example from redis: [https://redis.io/topics/rediscli]
100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks, 5 days ago 0|z000o8:
ZooKeeper ZOOKEEPER-3287

admin command to dump currently known ACLs

New Feature Open Trivial Unresolved Unassigned Brian Nixon Brian Nixon 21/Feb/19 23:30   21/Feb/19 23:30   3.6.0   server   0 1   Add a new command to dump the set of ACLs currently applied on the data tree.

 

Used by an admin to check what controls are being set for an ensemble. A flat list with no connection to the data will suffice - will have to think whether any details ought to be emitted as a cryptographic hash to preserve secrecy.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 6 days ago 0|yi19aw:
ZooKeeper ZOOKEEPER-3286

xid wrap-around causes connection loss/segfault when hitting predefined XIDs

Bug Open Minor Unresolved Unassigned Christian Czezatke Christian Czezatke 21/Feb/19 17:00   05/Feb/20 07:16   3.5.4 3.5.8 c client   0 1 0 9000   CentOS 7.2 *Description:*

The get_xid functions in mt_adaptor.c/st_adaptor.c return a 32 bit signed integer that is initialized to the current unix epoch timestamp on startup.

This counter will eventually wrap around, which is not a problem per se, since the client does not expect XID values to monotonically increase: It just verifies that replies to operations come back in order by checking the XID of a request received against the next XID expected. (zookeeper.c:zookeeper_process).

However, after a wrap-around the XID values will eventually collide with the reserved XIDs ad defined in zk_adaptor.h:
* The first collision will be with SET_WATCHES_XID (-8): The reply to the request that happens to get tagged with -8 will be misinterpreted as a reply to SET_WATCHES. This causes the client to see a connection timeout.
* The next collision will be with AUTH_XID (-4): At that point the client will segfault, when mis-interpreting the reply:

#0  0x0000000000407645 in auth_completion_func (zh=0x61d010, rc=0) at src/zookeeper.c:1823
#1  zookeeper_process (zh=zh@entry=0x61d010, events=<optimized out>) at src/zookeeper.c:2896
#2  0x000000000040c34c in do_io (v=0x61d010) at src/mt_adaptor.c:451
#3  0x00007ffff7bc8dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff75f573d in clone () from /lib64/libc.so.6

I hit this with a busy C client that runs for a very long time (months). Also, when a client spins in a tight loop trying to submit more operations even for a short period of time after a connection loss the xid values will increment very fast.

 

*Proposed patch:*

To avoid introducing any additional locking, this can be solved by just masking out the MSB in the xid returned by get_xid. Effectively this prevents the returned XID from ever going negative.

To avoid a race when the static xid variable hits -1 eventually after a wrap, around, I propose to not initialize xid with the result of time(0) on startup. This is not needed. This also means that the get_xid function in mt_adapter.c no longer needs to be flagged as constructor.

 Proposed patch is attached.

 

I ran into this on zookeeper 3.5.4 but other versions are likely affected as well.
100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Patch
1 year, 4 weeks ago 0|yi1900:
ZooKeeper ZOOKEEPER-3285

ZOOKEEPER-3021 Move assembly into its own sub-module

Sub-task Closed Blocker Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 21/Feb/19 08:27   20/May/19 13:50 26/Feb/19 11:07 3.5.4, 3.6.0 3.6.0, 3.5.5 build, scripts   0 2 0 13200   In order to create a "convenience tar" it is better to create a seperate sub-module for assembly, as if it's in the parent pom, it will be built first, and the binaries will not necessarily be available at the time (only if it was built prior). Even if it is available, I can't refer to the artifact from assembly descriptor.

Bonus: add automatic checksum generation for every artifact created by maven. sha512 should be used. md5 is deprecated and sha1 is only 20 bytes.

For now, I will not backport this to 3.4
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 2 days ago 0|yi18c0:
ZooKeeper ZOOKEEPER-3284

Tests fails on 3.4 if running with surefire

Bug Open Major Unresolved Unassigned Norbert Kalmár Norbert Kalmár 21/Feb/19 08:24   21/Feb/19 08:24   3.4.14   tests   0 1   Some tests fails constantly if run by maven. The failures are waiting for ZK server to start or waiting for client to connect. Looks like something is different in the junit runner. Possibly something is different version on the classpath. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks ago 0|yi18bk:
ZooKeeper ZOOKEEPER-3283

ZOOKEEPER-3282 add the new doc: zookeeperClients.md

Sub-task In Progress Major Unresolved maoling maoling maoling 19/Feb/19 06:53   02/Sep/19 04:12       documentation   0 1 0 600   1.2 write Clients[2.4], which includes the: 
      1.2.1 C client 
      1.2.2 zk-python, kazoo
      1.2.3 Curator etc.......
      look at an example from: https://redis.io/clients
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks, 2 days ago 0|yi14p4:
ZooKeeper ZOOKEEPER-3282

a big refactor for the documetations

Task In Progress Major Unresolved maoling maoling maoling 19/Feb/19 06:50   04/Aug/19 08:48       documentation   0 1   ZOOKEEPER-3283, ZOOKEEPER-3288, ZOOKEEPER-3529, ZOOKEEPER-3616 Hi guys:

I'am working on doing a big refactor for the documetations.it aims to

- 1.make a better reading experiences and help users know more about zookeeper quickly,as good as other projects' doc(e.g redis,hbase).

- 2.have less changes to diff with the original docs as far as possible.

- 3.solve the problem when we have some new features or improvements,but cannot find a good place to doc it.

 

The new catalog may looks kile this:

* is new one added.

** is the one to keep unchanged as far as possible.

*** is the one modified.

--------------------------------------------------------------

|---Overview

   |---Welcome ** [1.1]

   |---Overview ** [1.2]

   |---Getting Started ** [1.3]

   |---Release Notes ** [1.4]

|---Developer

   |---API *** [2.1]

   |---Programmer's Guide ** [2.2]

   |---Recipes *** [2.3]

   |---Clients * [2.4]

   |---Use Cases * [2.5]

|---Admin & Ops

   |---Administrator's Guide ** [3.1]

   |---Quota Guide ** [3.2]

   |---JMX ** [3.3]

   |---Observers Guide ** [3.4]

   |---Dynamic Reconfiguration ** [3.5]

   |---Zookeeper CLI * [3.6]

   |---Shell * [3.7]

   |---Configuration flags * [3.8]

   |---Troubleshooting & Tuning  * [3.9]

|---Contributor Guidelines

   |---General Guidelines * [4.1]

   |---ZooKeeper Internals ** [4.2]

|---Miscellaneous

   |---Wiki ** [5.1]

   |---Mailing Lists ** [5.2]

--------------------------------------------------------------










The Roadmap is:

1.(I pick up it : D)

 1.1 write API[2.1], which includes the:

   1.1.1  original API Docs which is a Auto-generated java doc,just give a link.

   1.1.2. Restful-api (the apis under the /zookeeper-contrib-rest/src/main/java/org/apache/zookeeper/server/jersey/resources)

 1.2 write Clients[2.4], which includes the:

     1.2.1 C client

     1.2.2 zk-python, kazoo

     1.2.3 Curator etc.......

     look at an example from: https://redis.io/clients




# write Recipes[2.3], which includes the:

 - integrate "Java Example" and "Barrier and Queue Tutorial"(Since some bugs in the examples and they are obsolete,we may delete something) into it.

 - suggest users to use the recipes implements of Curator and link to the Curator's recipes doc.

 
# write Zookeeper CLI[3.6], which includes the:

 - about how to use the zk command line interface [./zkCli.sh]

   e.g ls /; get ; rmr;create -e -p etc.......

 - look at an example from redis: https://redis.io/topics/rediscli

 
# write shell[3.7], which includes the:

  - list all usages of the shells under the zookeeper/bin. (e.g zkTxnLogToolkit.sh,zkCleanup.sh)

 
# write Configuration flags[3.8], which includes the:

  - list all usages of configurations properties(e.g zookeeper.snapCount):

  - move the original Advanced Configuration part of zookeeperAdmin.md into it.

    look at an example from:https://coreos.com/etcd/docs/latest/op-guide/configuration.html

  
# write Troubleshooting & Tuning[3.9], which includes the:

  - move the original "Gotchas: Common Problems and Troubleshooting" part of Administrator's Guide.md into it.

  - move the original "FAQ" into into it.

  - add some new contents (e.g https://www.yumpu.com/en/document/read/29574266/building-an-impenetrable-zookeeper-pdf-github).

  look at an example from:https://redis.io/topics/problems

                            https://coreos.com/etcd/docs/latest/tuning.html

 
# write General Guidelines[4.1], which includes the:

 - move the original "Logging" part of ZooKeeper Internals into it as the logger specification.

 - write specifications about code, git commit messages,github PR  etc ...

   look at an example from:

   http://hbase.apache.org/book.html#hbase.commit.msg.format

 
# write Use Cases[2.5], which includes the:

 - just move the context from: https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy into it.

 - add some new contents.(e.g Apache Projects:Spark;Companies:twitter,fb)

 

--------------------------------------------------------------

BTW:

- Any insights or suggestions are very welcomed.After the dicussions,I will create a series of tickets(An umbrella)

- Since these works can be done parallelly, if you are interested in them, please don't hesitate,just assign to yourself, pick it up. (Notice: give me a ping to avoid the duplicated work).
100% 24600 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks, 2 days ago 0|yi14oo:
ZooKeeper ZOOKEEPER-3281

Add a new CLI:watch

New Feature Resolved Major Won't Fix maoling maoling maoling 17/Feb/19 20:51   18/Nov/19 21:03 12/Nov/19 05:09 3.6.0   scripts   0 1 0 4200   Terminal1:t1;
Terminal2:t2;

PART1:
--------------[-d] test for data change------------------------
[t1]:
watch -d /testwatch
[t2]:
set /testwatch mydata
[t1]: result:
WatchedEvent state:SyncConnected
type:NodeDataChanged
path:/testwatch
new data:mydata
----------------------------------------------------------------
[t1]:
watch -d /testwatch
[t2]:
delete /testwatch
[t1] result:
WatchedEvent state:SyncConnected
type:NodeDeleted
path:/testwatch

PART2:
--------------[-c] test for child change------------------------
[t1]:
watch -c /testwatch
[t2]
create /testwatch/child_1 mydata
[t1] reslut:
WatchedEvent state:SyncConnected
type:NodeChildrenChanged
path:/testwatch
new child list:[child_1]
----------------------------------------------------------------
[t1]:
watch -c /testwatch
[t2]:
delete /testwatch/child_1
[t1]:
WatchedEvent state:SyncConnected
type:NodeChildrenChanged
path:/testwatch
new child list:[]

PART3:
----------------[-e]test for exist watch----------------------
[t2]:
delete /testwatch
[t1]:
watch -e /testwatch
[t2]:
create /testwatch mydata
[t1] result:
WatchedEvent state:SyncConnected
type:NodeCreated
path:/testwatch
----------------------------------------------------------------
[t1]:
watch -e /testwatch
[t2]:
delete /testwatch
WatchedEvent state:SyncConnected
type:NodeDeleted
path:/testwatch
----------------------------------------------------------------
[t1]:
watch -e /testwatch
[t2]:
set /testwatch mydata666666666
[t1]:
WatchedEvent state:SyncConnected
type:NodeDataChanged
path:/testwatch

----------------------------------------------------------------
a test for watching a non-existent key
[t1]:
watch -d /non-existent_key
Node does not exist: /non-existent_key
watch -c /non-existent_key
Node does not exist: /non-existent_key
watch -e /non-existent_key
[t2]:
create /non-existent_key mydata
[t1]:
WatchedEvent state:SyncConnected
type:NodeCreated
path:/non-existent_key
----------------------------------------------------------------
the test for other watchedEvent state: e.g. Disconnected
[t1]:
watch -c /testwatch
#kill the zk server
WatchedEvent state:Disconnected
type:None
path:null
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks, 3 days ago 0|yi12qg:
ZooKeeper ZOOKEEPER-3280

ClientCnxn xid rollover can break sessions

Bug Resolved Major Duplicate Unassigned Jonathan Park Jonathan Park 15/Feb/19 16:49   19/Feb/19 16:22 15/Feb/19 17:05 3.4.6, 3.4.12   java client   0 5    
{code:java}
2019-02-15 13:40:21,471 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@759] - Got auth sessionid:0x168f2c5e9c60017
2019-02-15 13:40:21,472 [myid:] - WARN  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1166] - Session 0x168f2c5e9c60017 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 for a packet with details: clientPath:null serverPath:null finished:false header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{}
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2019-02-15 13:40:22,520 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2019-02-15 13:40:22,521 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2019-02-15 13:40:22,521 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session establishment request sent on localhost/127.0.0.1:2181
2019-02-15 13:40:22,522 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x168f2c5e9c60017, negotiated timeout = 30000
2019-02-15 13:40:22,525 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping response for sessionid: 0x168f2c5e9c60017 after 235329552ms
{code}
ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this can lead to rollover into the negative xid space. Xid = -4 is treated as a special xid reserved for auth requests. With xid rollover, a normal ZK request can also have xid = -4 but the response will be treated as an auth response making subsequent packet processing error with the exception above. We can reproduce this more readily by changing the starting xid in ClientCnxn from 1 to -100. The ZK client will transparently reconnect and establish a new session but features that depended on the same session persisting will unnecessarily experience a disconnected event.

 

I've attached simple class with a main() method that reproduces the failure quickly against a local ZK server after modifying the initial value of ClientCnxn.xid from 1 to -100. 

 
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 4 weeks, 2 days ago 0|yi11h4:
ZooKeeper ZOOKEEPER-3279

Maven tests fail on branch-3.4

Bug Open Major Unresolved Unassigned Enrico Olivelli Enrico Olivelli 14/Feb/19 06:25   14/Feb/19 06:25   3.4.14   build, security, tests   0 1   Seems that the maven build lacks some dependency on branch-3.4,

I have these errors while testing the 3.4.14 RC 1 source tarball

[ERROR] org.apache.zookeeper.server.quorum.auth.ApacheDSMiniKdcTest  Time elapsed: 1.161 s  <<< ERROR!
java.lang.NoClassDefFoundError: jdbm/helper/CachePolicy
Caused by: java.lang.ClassNotFoundException: jdbm.helper.CachePolicy
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 5 weeks ago 0|yi0zag:
ZooKeeper ZOOKEEPER-3278

Maven sources are not buildable if not linked to a git respotiory

Bug Open Major Unresolved Unassigned Enrico Olivelli Enrico Olivelli 14/Feb/19 05:43   14/Feb/19 05:43   3.6.0, 3.5.5, 3.4.14   build   0 1   [ERROR] Failed to execute goal pl.project13.maven:git-commit-id-plugin:2.2.5:revision (find-current-git-revision) on project zookeeper-server: .git directory is not found! Please specify a valid [dotGitDirectory] in your pom.xml -> [Help 1] 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 5 weeks ago 0|yi0z8w:
ZooKeeper ZOOKEEPER-3277

Add trace listener in NettyServerCnxnFactory only if trace logging is enabled

Improvement Closed Trivial Fixed Ilya Maykov Ilya Maykov Ilya Maykov 13/Feb/19 17:09   20/May/19 13:50 15/Feb/19 09:37 3.6.0, 3.5.5 3.6.0, 3.5.5     0 2 0 2400   100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks, 6 days ago 0|yi0yk8:
ZooKeeper ZOOKEEPER-3276

Make X509UtilTest.testCreateSSLServerSocketWithPort less flaky

Improvement Closed Trivial Fixed Ilya Maykov Ilya Maykov Ilya Maykov 11/Feb/19 19:10   20/May/19 13:50 12/Feb/19 09:34 3.6.0, 3.5.5 3.6.0, 3.5.5     0 2 0 1800   Saw a Jenkins test failure where the port returned by PortAssignment.unique() was in use. Theory: by setting custom cipher suites before getting the free port, we will make it less likely that another thread/process grabs the port under us, thus making the test failure less likely. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 5 weeks, 2 days ago 0|yi0vlk:
ZooKeeper ZOOKEEPER-3275

ZOOKEEPER-3021 Fix release targets: package, tar, mvn-deploy

Sub-task Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 08/Feb/19 12:34   02/Apr/19 06:40 13/Feb/19 09:48 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build   0 3 0 15000   We changed the folder structure as part of the Maven migration, but And build targets which are related to releasing ZooKeeper haven't been updated completely. 100% 100% 15000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 5 weeks, 1 day ago 0|yi0sv4:
ZooKeeper ZOOKEEPER-3274

Use CompositeByteBuf to queue data in NettyServerCnxn

Improvement Closed Major Fixed Ilya Maykov Ilya Maykov Ilya Maykov 06/Feb/19 18:04   20/May/19 13:50 18/Feb/19 11:06 3.6.0, 3.5.5 3.6.0, 3.5.5     0 2 0 7200   100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks, 3 days ago 0|yi0qdk:
ZooKeeper ZOOKEEPER-3273

Sync BouncyCastle version in Maven build and Ant build

Improvement Closed Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 06/Feb/19 16:40   20/May/19 13:50 07/Feb/19 05:20 3.6.0, 3.5.5 3.6.0, 3.5.5 build   0 2 0 2400   Maven is with 1.60, Ant is with 1.56 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 weeks ago 0|yi0q9c:
ZooKeeper ZOOKEEPER-3272

Clean up netty4 code per Norman Maurer's review comments

Improvement Closed Minor Fixed Ilya Maykov Ilya Maykov Ilya Maykov 06/Feb/19 14:19   20/May/19 13:50 15/Feb/19 09:34 3.6.0, 3.5.5 3.6.0, 3.5.5     0 2 0 26400   100% 100% 26400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks, 6 days ago 0|yi0q48:
ZooKeeper ZOOKEEPER-3271

Add new maven profile for PR tests

Improvement Resolved Major Invalid Norbert Kalmár Norbert Kalmár Norbert Kalmár 06/Feb/19 08:20   13/Feb/19 09:16 13/Feb/19 09:16 3.6.0, 3.5.5, 3.4.14   build, tests   0 1   We already have 2 maven profiles:
- full build - builds and tests everything, including contrib, c-client
- java-build - only builds and tests java server, client and recipes

For the PR test run, we should add a third one, that excludes contrib from the full-build profile.

As Enrico suggested, we could have seperate jobs for testing C and Java:
- test java -> run with java-build profile
- test C -> run with newly added c-client profile
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 weeks, 1 day ago 0|yi0pnc:
ZooKeeper ZOOKEEPER-3270

Publish metrics for connected clients with user stats

New Feature Open Major Unresolved Dinesh Appavoo Dinesh Appavoo Dinesh Appavoo 03/Feb/19 23:06   04/Feb/19 21:37       other   0 1   There are metrics being published for connected clients. But if we want to find why connection count went up or down, it would be better to expose connections per user stats. (i.e. Some auth info for the connection)

Introduce new `clnt` command which is similar to `cons` except this also displays the user-agent to identify the type of client used to connect. List full connection/session details for all clients connected to this server. Includes information on numbers of packets received/sent, session id, operation latencies, last operation performed, user-agent, etc...
features 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 weeks, 3 days ago 0|yi0mbs:
ZooKeeper ZOOKEEPER-3269

Testable facade would benefit from a queueEvent() method

New Feature Resolved Major Fixed Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 03/Feb/19 17:41   06/Feb/19 06:27 06/Feb/19 03:46   3.6.0 java client   0 3 0 10200   For testing and other reasons it would be very useful to add a way to inject an event into ZooKeeper's event queue. ZooKeeper already has the {{Testable}} for features such as this (low level, backdoor, testing, etc.). This queueEvent method would be particularly helpful to Apache Curator and we'd very much appreciate its inclusion.

The method should have the signature:

{code}
void queueEvent(WatchedEvent event);
{code}

Calling this would have the affect of queueing an event into the clients queue.
100% 100% 10200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 weeks, 1 day ago 0|yi0ma8:
ZooKeeper ZOOKEEPER-3268

ZOOKEEPER-3245 Add commit processor metrics

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 02/Feb/19 00:50   25/Apr/19 03:06 24/Apr/19 20:12   3.6.0 server   0 2 0 28200   100% 100% 28200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
47 weeks ago 0|yi0lbs:
ZooKeeper ZOOKEEPER-3267

ZOOKEEPER-3245 Add watcher metrics

Sub-task Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 02/Feb/19 00:49   05/Mar/19 08:42 04/Mar/19 18:04 3.6.0 3.6.0 server   0 2 0 9000   100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 2 weeks, 2 days ago 0|yi0lbk:
ZooKeeper ZOOKEEPER-3266

ZooKeeper Java client blocks for a very long time.

Bug Open Major Unresolved Unassigned Jiafu Jiang Jiafu Jiang 31/Jan/19 20:54   26/Feb/19 06:47   3.4.13   java client   0 3   I found that ZooKeeper java client blocked, and the related call stack was shown below:

"Election thread-20" #20 prio=5 os_prio=0 tid=0x00007f7deeadfd80 nid=0x5ec3 in Object.wait() [0x00007f7ddd5d8000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
- locked <0x00000000e04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
at com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)
........

 

And I also found that the block process did not have the SendThread thread. It seems like a normal process with ZooKeeper java client should have a SendThread, like below:

"Thread-0-SendThread(ofs_zk1:2181)" #23 daemon prio=5 os_prio=0 tid=0x00007f8c540379c0 nid=0x739 runnable [0x00007f8c5ad71000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000e00287a8> (a sun.nio.ch.Util$3)
- locked <0x00000000e0028798> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000e0028750> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

 

So, will the missing of the SendThread thread cause the blocking of exist method?? I'm not sure.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 2 days ago 0|yi0jvs:
ZooKeeper ZOOKEEPER-3265

Build failure on branch-3.4

Bug Closed Major Fixed Zsombor Gegesy Zsombor Gegesy Zsombor Gegesy 31/Jan/19 17:17   02/Apr/19 06:40 04/Feb/19 11:10 3.4.14 3.6.0, 3.5.5, 3.4.14 build   0 2 0 7800   Building Zookeeper branch-3.4 fails with Ant, if I try:
ant package:

{{package:
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/recipes/queue
[mkdir] Created dir: /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/recipes/queue/src/test/java
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/recipes/queue/src/test/java
[mkdir] Created dir: /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/recipes/queue/src/main/java
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/recipes/queue/src/main/java
[mkdir] Created dir: /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/recipes/queue/src/main/c
[copy] Copying 15 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/recipes/queue/src/main/c
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper
[mkdir] Created dir: /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/dist-maven
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/dist-maven
[copy] Copying 2 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/dist-maven
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/dist-maven
[copy] Copying 2 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/bin
[copy] Copying 2 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/libexec
[copy] Copying 2 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/sbin
[copy] Copying 3 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/conf
[copy] Copying 304 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/docs
[copy] Copying 7 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT
[copy] Copying 72 files to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/src
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/templates/conf
[copy] Copying 1 file to /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/share/zookeeper/templates/conf

BUILD FAILED
/Users/test/src/zookeeper/build.xml:973: /Users/test/src/zookeeper/build/zookeeper-3.4.14-SNAPSHOT/src/zookeeper-contrib does not exist.
}}

The fileset which tries to locate executables in the contrib area doesn't match anything.
100% 100% 7800 0 build-failure, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
1 year, 6 weeks, 3 days ago 0|yi0jqo:
ZooKeeper ZOOKEEPER-3264

Add a benchmark tool for zookeeper

New Feature Open Major Unresolved maoling maoling maoling 30/Jan/19 22:57   04/Nov/19 21:28       other   0 3 0 1200   Reference:
https://github.com/etcd-io/etcd/blob/master/tools/benchmark/cmd/range.go
https://github.com/antirez/redis/blob/unstable/src/redis-benchmark.c
https://github.com/phunt/zk-smoketest/blob/master/zk-latencies.py
https://github.com/brownsys/zookeeper-benchmark/blob/master/src/main/java/edu/brown/cs/zkbenchmark/ZooKeeperBenchmark.java
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year ago 0|yi0idk:
ZooKeeper ZOOKEEPER-3263

Illegal reflective access in zookeer's kerberosUtil

Improvement Closed Major Fixed Andor Molnar Pradeep Bansal Pradeep Bansal 30/Jan/19 20:06   16/Oct/19 14:58 20/May/19 11:19 3.4.13 3.6.0, 3.5.6     0 4 0 12000   I am using kafka 2.11-2.1.0 with Java 11. Kafka is using zookeeper-3.4.13.jar and when am running kafka-acl script to maange ACLs, I am getting below warning. Is there a way to resolve this?

{{WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.zookeeper.server.util.KerberosUtil (file://apache/kafka/kafka_2.11-2.1.0/libs/zookeeper-3.4.13.jar) to method sun.security.krb5.Config.getInstance() WARNING: Please consider reporting this to the maintainers of org.apache.zookeeper.server.util.KerberosUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release}}
100% 100% 12000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
43 weeks, 3 days ago 0|yi0i9k:
ZooKeeper ZOOKEEPER-3262

Update dependencies flagged by OWASP report

Improvement Closed Blocker Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 30/Jan/19 15:50   02/Apr/19 06:40 08/Feb/19 00:07 3.6.0, 3.5.5, 3.4.14 3.6.0, 3.5.5, 3.4.14 security   0 3 0 19800   Currently OWASP plugin is reporting these vulnerabilities:
|[CVE-2018-14719|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-14719]|CWE-502 Deserialization of Untrusted Data|High(7.5)|jackson-databind-2.9.5.jar|
|[CVE-2018-14720|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-14720]|CWE-611 Improper Restriction of XML External Entity Reference ('XXE')|High(7.5)|jackson-databind-2.9.5.jar|
|[CVE-2018-14721|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-14721]|CWE-918 Server-Side Request Forgery (SSRF)|High(7.5)|jackson-databind-2.9.5.jar|
|[CVE-2018-19360|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-19360]|CWE-502 Deserialization of Untrusted Data|High(7.5)|jackson-databind-2.9.5.jar|
|[CVE-2018-19361|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-19361]|CWE-502 Deserialization of Untrusted Data|High(7.5)|jackson-databind-2.9.5.jar|
|[CVE-2018-19362|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-19362]|CWE-502 Deserialization of Untrusted Data|High(7.5)|jackson-databind-2.9.5.jar|
|[CVE-2017-7657|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2017-7657]|CWE-190 Integer Overflow or Wraparound|High(7.5)|jetty-http-9.4.10.v20180503.jar   |
|[CVE-2017-7658|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2017-7658]|CWE-19 Data Processing Errors|High(7.5)|jetty-http-9.4.10.v20180503.jar   |
|[CVE-2018-1000873|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1000873]|CWE-20 Improper Input Validation|Medium(5.0)|jackson-databind-2.9.5.jar|
|[CVE-2017-7656|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2017-7656]|CWE-284 Improper Access Control|Medium(5.0)|jetty-http-9.4.10.v20180503.jar   |
|[CVE-2018-12536|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-12536]|CWE-200 Information Exposure|Medium(5.0)|jetty-http-9.4.10.v20180503.jar   |
|[CVE-2018-12056|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-12056]|CWE-338 Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)|Medium(5.0)|netty-all-4.1.29.Final.jar|

We have to upgrade all of them or add suppressions

 

in the Maven build we also have;

pom.xml: CVE-2018-8012, CVE-2016-5017
100% 100% 19800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 5 weeks, 6 days ago 0|yi0hyo:
ZooKeeper ZOOKEEPER-3261

improve the "./zkServer.sh status" cmd to show the info about myid

Improvement Resolved Minor Won't Fix Unassigned maoling maoling 30/Jan/19 02:02   14/Oct/19 06:23 14/Oct/19 06:23     scripts   0 1   keep the standalone unchanged.
{code:java}
[root@959572f662cb bin]# ./zkServer.sh status
JMX enabled by default
Using config: /data/software/zookeeper/zookeeper-standalone-test/bin/../conf/zoo.cfg
Mode: standalone{code}

When in the quorum mode:
{code:java}
[root@959572f662cb bin]# ./zkServer.sh status
JMX enabled by default
Using config: /data/software/zookeeper/zookeeper-standalone-test/bin/../conf/zoo.cfg
Myid:2
Mode: standalone{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 1 day ago 0|yi0gw0:
ZooKeeper ZOOKEEPER-3260

Open JDK 11 support in Zookeeper

Bug Open Major Unresolved Unassigned Chintan Chintan 29/Jan/19 06:45   29/Jan/19 06:45           0 1   I tried to find any existing Jira for this but could not find it. This Jira is for introducing support for Open JDK 11 in zookeeper. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 2 days ago 0|yi0fjk:
ZooKeeper ZOOKEEPER-3259

create a separate document:client.md about the usage of the client.

New Feature Open Major Unresolved maoling maoling maoling 28/Jan/19 06:43   19/Feb/19 07:04       documentation   0 2   1.1 write API[2.1], which includes the: 
    1.1.1  original API Docs which is a Auto-generated java doc,just give a link.
    1.1.2. Restful-api (the apis under the /zookeeper-contrib-rest/src/main/java/org/apache/zookeeper/server/jersey/resources)
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 weeks ago 0|yi0dqw:
ZooKeeper ZOOKEEPER-3258

Trust

Task Resolved Major Invalid Unassigned Shae Shae 27/Jan/19 02:33   27/Aug/19 09:52 27/Aug/19 09:52         0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 4 days ago 0|yi0cso:
ZooKeeper ZOOKEEPER-3257

Merge count and byte update of Stat

Improvement Resolved Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 25/Jan/19 18:43   30/Jan/19 17:00 30/Jan/19 11:02 3.6.0 3.6.0 server   0 2 0 3000   There is duplication of effort when updating the stats. Merge the count update and the byte update into one call and simplify the logic. 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 1 day ago 0|yi0c60:
ZooKeeper ZOOKEEPER-3256

ZOOKEEPER-3021 Enable OWASP checks to Maven build

Sub-task Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 25/Jan/19 16:40   04/Oct/19 10:55 31/Jan/19 04:41   3.6.0, 3.5.5, 3.4.14 security   0 2 0 6600   Port OWASP check task to the Maven build, the suppressionsFile is the same as the ANT task

use this command to run the check:
{code:java}
mvn org.owasp:dependency-check-maven:aggregate{code}
 

ant based counterpart is:
{code:java}
ant owasp{code}
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks ago 0|yi0c0g:
ZooKeeper ZOOKEEPER-3255

add a banner to make the startup of zk server more cool

Improvement Resolved Minor Fixed maoling maoling maoling 24/Jan/19 07:28   04/Mar/19 05:24 04/Mar/19 02:07 3.5.4 3.6.0 server   1 2 0 3000   2019-01-24 11:27:37,370 [myid:] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2019-01-24 11:27:37,370 [myid:] - WARN [main:QuorumPeerMain@130] - Either no config or no quorum defined in config, running in standalone mode
2019-01-24 11:27:37,372 [myid:] - INFO [main:ManagedUtil@46] - Log4j found with jmx enabled.
2019-01-24 11:27:37,387 [myid:] - INFO [main:QuorumPeerConfig@139] - Reading configuration from: /data/software/zookeeper/zookeeper-standalone-test/bin/../conf/zoo.cfg
2019-01-24 11:27:37,387 [myid:] - WARN [main:VerifyingFileFactory@59] - ../../zkdata is relative. Prepend ./ to indicate that you're sure!
2019-01-24 11:27:37,387 [myid:] - WARN [main:VerifyingFileFactory@59] - ../../zkdataLog is relative. Prepend ./ to indicate that you're sure!
2019-01-24 11:27:37,388 [myid:] - INFO [main:QuorumPeerConfig@402] - clientPortAddress is 0.0.0.0:2181
2019-01-24 11:27:37,389 [myid:] - INFO [main:QuorumPeerConfig@406] - secureClientPort is not set
2019-01-24 11:27:37,389 [myid:] - INFO [main:QuorumPeerConfig@423] - observerMasterPort is not set
2019-01-24 11:27:37,389 [myid:] - INFO [main:QuorumPeerConfig@441] - metricsProvider.className is org.apache.zookeeper.metrics.impl.NullMetricsProvider
2019-01-24 11:27:37,389 [myid:] - INFO [main:ZooKeeperServerMain@122] - Starting server
2019-01-24 11:27:37,419 [myid:] - INFO [main:ZookeeperBanner@87] -
2019-01-24 11:27:37,419 [myid:] - INFO [main:ZookeeperBanner@87] - ______ _
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] - |___ / | |
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] - / / ___ ___ | | __ ___ ___ _ __ ___ _ __
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] - / / / _ \ / _ \ | |/ / / _ \ / _ \ | '_ \ / _ \ | '__|
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] - / /__ | (_) | | (_) | | < | __/ | __/ | |_) | | __/ | |
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] - /_____| \___/ \___/ |_|\_\ \___| \___| | .__/ \___| |_|
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] - | |
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] - |_|
2019-01-24 11:27:37,420 [myid:] - INFO [main:ZookeeperBanner@87] -
2019-01-24 11:27:37,425 [myid:] - INFO [main:Environment@109] - Server environment:zookeeper.version=3.6.0-SNAPSHOT-29f9b2c1c0e832081f94d59a6b88709c5f1bb3ca, built on 01/17/2019 12:32 GMT
2019-01-24 11:27:37,425 [myid:] - INFO [main:Environment@109] - Server environment:host.name=959572f662cb
2019-01-24 11:27:37,425 [myid:] - INFO [main:Environment@109] - Server environment:java.version=1.8.0_112
2019-01-24 11:27:37,425 [myid:] - INFO [main:Environment@109] - Server environment:java.vendor=Oracle Corporation
2019-01-24 11:27:37,425 [myid:] - INFO [main:Environment@109] - Server environment:java.home=/usr/java/jdk1.8.0_112/jre
------------------------------------------------------------------------------------------------------------------------------
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
1 year, 2 weeks, 3 days ago 0|yi09io:
ZooKeeper ZOOKEEPER-3254

Drop 'beta' qualifier from Branch 3.5

Task Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 23/Jan/19 10:31   20/May/19 13:50 24/Jan/19 09:29 3.5.5 3.5.5 build   0 2 0 2400   100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks ago 0|yi083k:
ZooKeeper ZOOKEEPER-3253

client should not send requests with cxid=-4, -2, or -1

Bug Closed Minor Fixed Samuel Just Samuel Just Samuel Just 22/Jan/19 18:21   16/Dec/19 03:07 07/Mar/19 23:10 3.4.9, 3.5.4, 3.6.0 3.6.0, 3.5.5, 3.4.15 java client   0 4 172800 154200 18600 10% Once the cxid value increments to -4, the client will interpret the response as an auth packet rather than as a response to a request and will transparently drop the response and the request will hang.  Similarly, -2 will be seen as a ping and will be dropped hanging the request.  -1 shows up as a WatcherEvent and results in the error below.

 
{quote}2019-01-07T21:58:23.209+00:00 [INFO ] [main-SendThread(mnds1-2-phx.ops.sfdc.net:2181)] [ClientCnxn.java:1381] [:] - Session establishment complete on server mnds1-2-phx.ops.sfdc.net/10.246.244.71:2181, sessionid = 0x267859729d66320, negotiated timeout = 10000
2019-01-07T21:58:22.281+00:00 20190107215822.281000 [WARN ] [main-SendThread(mnds1-3-phx.ops.sfdc.net:2181)] [ClientCnxn.java:1235] [:] - Session 0x267859729d66320 for server mnds1-3-phx.ops.sfdc.net/10.246.244.69:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Unreasonable length = 892612659
at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127) ~[zookeeper-3.5.3-beta.jar:3.5.3-beta-8ce24f9e675cbefffb8f21a47e06b42864475a60]
at org.apache.jute.BinaryInputArchive.readString(BinaryInputArchive.java:81) ~[zookeeper-3.5.3-beta.jar:3.5.3-beta-8ce24f9e675cbefffb8f21a47e06b42864475a60]
at org.apache.zookeeper.proto.WatcherEvent.deserialize(WatcherEvent.java:66) ~[zookeeper-3.5.3-beta.jar:3.5.3-beta-8ce24f9e675cbefffb8f21a47e06b42864475a60]
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:839) ~[zookeeper-3.5.3-beta.jar:3.5.3-beta-8ce24f9e675cbefffb8f21a47e06b42864475a60]
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) ~[zookeeper-3.5.3-beta.jar:3.5.3-beta-8ce24f9e675cbefffb8f21a47e06b42864475a60]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) ~[zookeeper-3.5.3-beta.jar:3.5.3-beta-8ce24f9e675cbefffb8f21a47e06b42864475a60]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) ~[zookeeper-3.5.3-beta.jar:3.5.3-beta-8ce24f9e675cbefffb8f21a47e06b42864475a60]
{quote}
 
10% 10% 18600 154200 172800 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 6 days ago
Reviewed
0|yi071c:
ZooKeeper ZOOKEEPER-3252

Extend the options for the response cache

Improvement Open Minor Unresolved Unassigned Brian Nixon Brian Nixon 18/Jan/19 16:35   18/Jan/19 16:35       server   0 1   The response cache added in ZOOKEEPER-3180 is fairly bare bones. It does its job but there is room for experimentation and improvement. From the issue pull request ([https://github.com/apache/zookeeper/pull/684):]
{quote}"the alternate eviction policies you outline and that LinkedHashMap allows. I see three reasonable paths here,
{quote} *
{quote}Merge this pr as it is (perhaps rename LRUCache to just Cache) and open a new JIRA to explore future paths.{quote}
*
{quote}I add another property that lets one toggle between insertion order and access order with the current implementation as the default.{quote}
*
{quote}Drop LinkedHashMap entirely and go with something like a guava Cache."{quote}

It was merged with path 1 chosen but I remain interested in the optimizations that were suggested.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks, 6 days ago 0|yi02zc:
ZooKeeper ZOOKEEPER-3251

ZOOKEEPER-3245 Add new server metric types: percentile counter and counter set

Sub-task Resolved Major Fixed Jie Huang Jie Huang Jie Huang 18/Jan/19 15:08   31/Jan/19 11:45 30/Jan/19 10:43   3.6.0 server   0 2 0 7800   This will add three metric types:

AvgMinMaxCounterSet

AvgMinMaxPercentileCounter

AvgMinMaxPercentileCounterSet

The percentile metrics allow us to get a better sense of the latency distribution. They are more expensive than AvgMinMax counters and are restricted to latency measurements for now.

The counter set allows the grouping of metrics such as write per namespace, read per namespace.

 
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks ago 0|yi02y0:
ZooKeeper ZOOKEEPER-3250

typo in doc - zookeeperInternals

Improvement Closed Trivial Fixed Unassigned liwenjie liwenjie 18/Jan/19 04:03   20/May/19 13:50 29/Jan/19 09:14   3.6.0, 3.5.5 documentation   0 3 0 2400   "has long as" -> "as long as" 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 2 days ago 0|yi025c:
ZooKeeper ZOOKEEPER-3249

Avoid reverting the cversion and pzxid during replaying txns with fuzzy snapshot

Improvement Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 17/Jan/19 23:06   30/Jan/19 17:00 30/Jan/19 10:29 3.6.0 3.6.0 server   0 2 0 2400   The only case we need to have [the tricky hack code|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1036-L1065] , is because of the scenario below:

If the child is deleted due to session close and re-created in a different global session after that the parent is serialized, then when replay the txn because the node is belonging to a different session, replay the closeSession txn won't delete it anymore, and we'll get NODEEXISTS error when replay the createNode txn. In this case, we need to update the cversion and pzxid to the new value with this tricky code here.

This could be solved in ZOOKEEPER-3145 with explicit CloseSessionTxn. In theory, with that code, we don't need this kind of hack code anymore, but there is another case, which could cause the cversion and pzxid being reverted, and we still need to patch it, here is the scenario:

1. Start to take snapshot at T0
2. Txn T1 create /P/N1, set P's cversion and pzxid to (1, 1)
3. Txn T2 create /P/N2, set P's cversion and pzxid to (2, 2)
4. Txn T3 delete /P/N1, set P's pzxid to 3, which is (2, 3)

Those state are in the fuzzy snapshot.

When loading the snapshot and txns during start up based on the current code:

1. replay T1, since /P/N1 is not exist, we'll overwrite P's cversion and pzxid to (1, 1)
2. replay T2, node already exist, so go through the hack code to patch cversion and pzxid, and it became (2, 2)
3. replay T3, set P's pzxid to 3, which is now (2, 3)

The state is consistent with the tricky patch code, but it's error-prone and hacky, we should remove that. To be able to remove that, in this patch, we're going to check the cversion first and avoid reverting the cversion and pzxid when replaying txns.

We've also added metrics to verify that logic is not active on prod anymore, after that I'll open another Jira to remove it to make the logic cleaner.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 1 day ago 0|yi01x4:
ZooKeeper ZOOKEEPER-3248

EmbeddedZookeeper does not delete temp directory on shutdown in Windows

Bug Open Major Unresolved Unassigned Dmitrii Apanasevich Dmitrii Apanasevich 17/Jan/19 02:10   17/Jan/19 02:10           0 2   Windows 7, Java 8  I tend to consider the problem as a serious one. For example, a single run of a bunch of twenty tests leaves ~1Gb of garbage on the system disk (64Mb per test). It's easy to get "No disc space".

I've created [a simple example on GitHub|https://github.com/apanasevich/embedded-servers-example[]|https://github.com/apanasevich/embedded-servers-example] that demonstrates the problem. Here is the exception which throws EmbeddedZookeeper#shutdown:
{code:java}
java.nio.file.FileSystemException: C:\Users\D15E0~1.APA\AppData\Local\Temp\kafka-8507336206769425170\version-2\log.1: The process cannot access the file because it is being used by another process

at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:757)
at org.apache.kafka.common.utils.Utils$2.visitFile(Utils.java:746)
at java.nio.file.Files.walkFileTree(Files.java:2670)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:746)
at kafka.zk.EmbeddedZookeeper.shutdown(EmbeddedZookeeper.scala:63)
at example.TempDirectoriesTest.tearDownTest(TempDirectoriesTest.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
at java.lang.Thread.run(Thread.java:748)
{code}
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 9 weeks ago 0|yi00ig:
ZooKeeper ZOOKEEPER-3247

New lest admin command to get leader election time

New Feature Open Major Unresolved Unassigned Dinesh Appavoo Dinesh Appavoo 12/Jan/19 18:29   12/Jan/19 18:29       leaderElection   0 1   Add lest admin command to get the last leader election time. features 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 9 weeks, 5 days ago 0|u00s6g:
ZooKeeper ZOOKEEPER-3246

Publish more stats when learner gets a diff, full snapshot, or does a truncate when it connects with the leader

New Feature Open Major Unresolved Unassigned Dinesh Appavoo Dinesh Appavoo 12/Jan/19 02:38   12/Jan/19 02:38       other   0 1   Problem
There is no way to currently tell whether a learner gets a diff or full snapshot, or does a truncate when it connects with the leader, also how long syncing with the leader takes. There is no explicit and general indicator of how often a learner has to connect to a leader without greping through zookeeper.log

Solution
Start tracking and exporting the following three items:
* counter incremented each time a learner sync's with a leader
* the type of sync that was needed
* how long the sync took
features 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 9 weeks, 5 days ago 0|u00rs8:
ZooKeeper ZOOKEEPER-3245

Add useful metrics for ZK pipeline and request/server states

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 11/Jan/19 22:22   08/Jul/19 17:32 07/Jun/19 10:54   3.6.0     0 1 0 3000   ZOOKEEPER-3251, ZOOKEEPER-3267, ZOOKEEPER-3268, ZOOKEEPER-3305, ZOOKEEPER-3309, ZOOKEEPER-3310, ZOOKEEPER-3313, ZOOKEEPER-3319, ZOOKEEPER-3321, ZOOKEEPER-3323, ZOOKEEPER-3324, ZOOKEEPER-3325, ZOOKEEPER-3326, ZOOKEEPER-3327, ZOOKEEPER-3328, ZOOKEEPER-3379, ZOOKEEPER-3383, ZOOKEEPER-3401 Add metrics to track time spent in the commit processor, watch counts and fire rates, how long a Zookeeper server is unavailable between elections, quorum packet size and time spent in the queue, aggregate request states/flow, request throttle, sync processor queue time, per-connection read and write request counts, commit processor queue sizes(read/write/commit), final request processor read/write times, watch manager cnxn/path counts, latencies at different points in pipeline for commits/informs, split up request type counters for more request types, export sum metrics for all AvgMinMax counters, per-connection watch fired counts, ack latency for each follower, percentile metrics to zeus latency counters, proposal count, number of outstanding changes,  snapshot and txns loading time during startup, number of non-voting followers, leader unavailable time, etc.

 
100% 100% 185400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks, 5 days ago 0|u00rns:
ZooKeeper ZOOKEEPER-3244

Add option to snapshot based on log size

New Feature Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 11/Jan/19 19:28   15/May/19 17:08 14/May/19 11:11   3.6.0 server   0 2 0 13200   Currently, ZooKeeper only takes snapshot based on the snap count. If the workload on an ensemble includes large txns then we'll end up with large amount data kept on disk, and might have a low disk space issue.

Add a maximum limit on the total size of the log files between each snapshot. This will change the snap frequency, which means with the same snap retention number a server will eat up less disk.

 
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
44 weeks, 2 days ago 0|u00rko:
ZooKeeper ZOOKEEPER-3243

Add server side request throttling

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 11/Jan/19 14:34   08/Jul/19 19:50 08/Jul/19 17:26   3.6.0 server   1 3 0 26400   On-going performance investigation at Facebook has demonstrated that Zookeeper is easily overwhelmed by spikes in connection rates and/or write request rates. Zookeeper performance gets progressively worse, clients timeout and try to reconnect (exacerbating the problem) and things enter a death spiral. To solve this problem, we need to add load protection to Zookeeper via rate limiting and work shedding.

This JIRA task adds a new request throttling mechanism (RequestThrottler) to Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during request spikes.
 
When enabled, the RequestThrottler limits the number Of outstanding requests currently submitted to the request processor pipeline.
 
The throttler augments the limit imposed by the globalOutstandingLimit that is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The connection layer limit applies backpressure against the TCP connection by disabling selection on connections once the request limit is reached. However, the connection layer always allows a connection to send at least one request before disabling selection on that connection. Thus, in a scenario with 40000 client connections, the total number of requests inflight may be as high as 40000 even if the globalOustandingLimit was set lower.
 
The RequestThrottler addresses this issue by adding additional queueing. When enabled, client connections no longer submit requests directly to the request processor pipeline but instead to the RequestThrottler. The RequestThrottler is then responsible for issuing requests to the request processors, and enforces a separate maxRequests limit. If the total number of outstanding requests is higher than maxRequests, the throttler will continually stall for stallTime milliseconds until under limit.
 
The RequestThrottler can also optionally drop stale requests rather than submit them to the processor pipeline. A stale request is a request sent by a connection that is already closed, and/or a request whose latency will end up being higher than its associated session timeout.
To ensure ordering guarantees, if a request is ever dropped from a connection that connection is closed and flagged as invalid. All subsequent requests inflight from that connection are then dropped as well.
 
The notion of staleness is configurable, both connection staleness and latency staleness can be individually enabled/disabled. Both these settings and the various throttle settings (limit, stall time, stale drop) can be configured via system properties as well as at runtime via JMX.
 
The throttler has been tested and benchmarked at Facebook
100% 100% 26400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
36 weeks, 3 days ago 0|u00rb4:
ZooKeeper ZOOKEEPER-3242

Add server side connecting throttling

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 11/Jan/19 14:04   15/Aug/19 10:52 23/Jan/19 08:00   3.6.0 server   0 3 0 11400   On-going performance investigation at Facebook has demonstrated that Zookeeper is easily overwhelmed by spikes in connection rates and/or write request rates. Zookeeper performance gets progressively worse, clients timeout and try to reconnect (exacerbating the problem) and things enter a death spiral. To solve this problem, we need to add load protection to Zookeeper via rate limiting and work shedding.
 
This Jira adds a new connection rate limiting mechanism to Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during connection spikes. 
The new throttle is focused on limiting connections per second. The throttle is implemented as a token-bucket with optional probabilistic dropping based on the BLUE queue management algorithm.
 
This token-bucket design allows the throttle to allow short bursts to pass, while still capping the total number of requests per second. However, an issue with a token bucket approach is that the wall clock arrival time of requests affects the probability of a request being allowed to pass or not. Under constant load this can lead to request starvation for requests that constantly arrive later than the majority. The optional probabilistic dropping mechanism is designed to combat this, making rejections a random event with little skew based on arrival time.
 
A more verbose description can be found in the comments in org.apache.zookeeper.server.BlueThrottle.
 
By default, both the token-bucket and probabilistic dropping mechanism are disabled. Enabling and tuning the throttles can be done both via Java system properties as well as against a running node via JMX.
 
The throttle has been tested and benchmarked at Facebook.
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks, 1 day ago 0|u00ra0:
ZooKeeper ZOOKEEPER-3241

Update C client for the new getEphemerals api

New Feature Open Major Unresolved Unassigned Dinesh Appavoo Dinesh Appavoo 10/Jan/19 16:48   12/Jan/19 02:26       c client   0 1   There is as new api `getEphemerals()` [https://github.com/apache/zookeeper/pull/735] being introduced in Zookeeper server. Update C client for the API with different parameters. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks ago 0|u00q00:
ZooKeeper ZOOKEEPER-3240

Close socket on Learner shutdown to avoid dangling socket

Improvement Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 10/Jan/19 15:18   02/Jul/19 12:53 02/Jul/19 01:34 3.6.0 3.6.0 server   0 3 0 10800   There was a Learner that had two connections to the Leader after that Learner hit an unexpected exception during flush txn to disk, which will shutdown previous follower instance and restart a new one.
 
{quote}2018-10-26 02:31:35,568 ERROR [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from thread : SyncThread:3
java.io.IOException: Input/output error
        at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
        at java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
        at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
        at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
        at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - Thread SyncThread:3 exits, error code 1
2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - SyncRequestProcessor exited!{quote}
 
It is supposed to close the previous socket, but it doesn't seem to be done anywhere in the code. This leaves the socket open with no one reading from it, and caused the queue full and blocked on sender.
 
Since the LearnerHandler didn't shutdown gracefully, the learner queue size keeps growing, the JVM heap size on leader keeps growing and added pressure to the GC, and cause high GC time and latency in the quorum.
 
The simple fix is to gracefully shutdown the socket.
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
37 weeks, 2 days ago 0|u00pyg:
ZooKeeper ZOOKEEPER-3239

Adding EnsembleAuthProvider to verify the ensemble name

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 09/Jan/19 12:44   13/Feb/19 12:44 13/Feb/19 07:19   3.6.0     0 2 0 6000   This AuthenticationProvider checks to make sure that the ensemble name the client intends to connect to matches the name that the server thinks it belongs to. If the name does not match,
this provider will close the connection

This AuthenticationProvider does not "authenticate" the client. It prevents the client accidentally connecting to a wrong ensemble.

This feature has been implemented in the Facebook internal branch and I'm going to upstream it to the trunk.
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 5 weeks, 1 day ago 0|u00oaw:
ZooKeeper ZOOKEEPER-3238

Add rel="noopener noreferrer" to target blank link in zookeeper-contrib-huebrowser

Improvement Resolved Major Fixed Colm O hEigeartaigh Colm O hEigeartaigh Colm O hEigeartaigh 09/Jan/19 07:57   31/Jan/19 11:45 31/Jan/19 08:18   3.6.0     0 2 0 1800   In zookeeper-contrib-huebrowser, there is a link that uses target="_blank". Best security practise is to also add rel="noopener noreferrer". See for example: https://dev.to/ben/the-targetblank-vulnerability-by-example 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks ago 0|u00nzc:
ZooKeeper ZOOKEEPER-3237

Allow IPv6 wildcard address in peer config

Improvement Resolved Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 08/Jan/19 20:07   16/May/19 15:24 16/May/19 10:25 3.6.0 3.6.0 server   0 2 0 9000   ZooKeeper allows a special exception for the IPv4 wildcard, 0.0.0.0, along with the loopback addresses. Extend the same treatment to IPv6's wildcard, [::]. Otherwise, reconfig will reject commands with the form [::]:<port>. 100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
44 weeks ago 0|u00new:
ZooKeeper ZOOKEEPER-3236

Upgrade BouncyCastle

Improvement Closed Major Fixed Colm O hEigeartaigh Colm O hEigeartaigh Colm O hEigeartaigh 07/Jan/19 11:48   20/May/19 13:50 29/Jan/19 10:09   3.6.0, 3.5.5     0 2 0 2400   BouncyCastle should be upgraded to the latest release. The current version we are picking up contains security advisories:

bcprov-jdk15on-1.56.jar (cpe:/a:bouncycastle:bouncy-castle-crypto-package:1.56, org.bouncycastle:bcprov-jdk15on:1.56, cpe:/a:bouncycastle:legion-of-the-bouncy-castle-java-crytography-api:1.56, cpe:/a:bouncycastle:bouncy_castle_crypto_package:1.56) : CVE-2017-13098, CVE-2018-1000180, CVE-2018-1000613
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 2 days ago 0|u00l5s:
ZooKeeper ZOOKEEPER-3235

Enable secure processing and disallow DTDs in the SAXParserFactory

Improvement Closed Major Fixed Colm O hEigeartaigh Colm O hEigeartaigh Colm O hEigeartaigh 07/Jan/19 09:58   20/May/19 13:50 09/Jan/19 09:10 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5 jute   0 2   We should enable the secure processing feature and disallow DTDs in the SAXParserFactory. This prevents a number of possible XXE style attacks. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks, 1 day ago 0|u00kwo:
ZooKeeper ZOOKEEPER-3234

Add Travis-CI configuration file

Improvement Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 02/Jan/19 10:09   22/Feb/19 20:00 07/Feb/19 05:13 3.6.0 3.6.0 build   0 2 0 27600   Let's add Travis-CI in order to have a more user-friendly integration with contributors. 100% 100% 27600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 5 days ago 0|u00gb4:
ZooKeeper ZOOKEEPER-3233

ZOOKEEPER-3170 Run github pre-commit hook tests on 4 threads

Sub-task Resolved Major Fixed Andor Molnar Andor Molnar Andor Molnar 02/Jan/19 08:39   04/Jan/19 10:22 03/Jan/19 05:24 3.6.0 3.6.0 tests   0 2 0 2400   Adjust GitHub pre-commit hook script to run Java unit tests on 4 threads only (currently 8) in order to enhance flaky test stability.
Test output can also be turned off, because test results are collected from xml files at end of the process.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 11 weeks ago 0|u00g9c:
ZooKeeper ZOOKEEPER-3232

make the log of notification about LE more readable

Improvement Resolved Minor Fixed maoling maoling maoling 01/Jan/19 08:09   18/Jan/19 01:28 17/Jan/19 21:50   3.6.0 leaderElection   0 3 0 1200   the log of notification about LE is very important to help us to see the process of LE:e.g.
{code:java}
2019-01-01 16:29:27,494 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@595] - Notification: 1 (message format version), 3 (n.leader), 0x60b3dc215 (n.zxid), 0x3 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state){code}
the current log have some problems:
1:don't use the placeholder(other:+),don't in the style of k:v(antiman)
2.the properties in the logs are very messed(no group by,no order), not easy to read.
3.the value about version is HEX which don't have the 0x prefix.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks, 6 days ago 0|u00fjs:
ZooKeeper ZOOKEEPER-3231

Purge task may lost data when the recent snapshots are all invalid

Bug In Progress Major Unresolved maoling Jiafu Jiang Jiafu Jiang 28/Dec/18 23:05   30/Jan/20 23:14   3.5.4, 3.4.13   server   0 3 0 8400   I read the ZooKeeper source code, and I find the purge task use FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and when a purge task begins, it will use the zxid in the last snapshot's name to purge old snapshots and transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots, but I am not sure.

 
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks, 2 days ago 0|u00du8:
ZooKeeper ZOOKEEPER-3230

Add Apache NetBeans Maven project files to .gitignore

Task Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 22/Dec/18 08:38   02/Apr/19 06:40 22/Dec/18 21:01 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 other   0 2   Now that we are on Maven NetBeans uses a different set of files.

I would like to put them into .gitignore
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 12 weeks, 4 days ago
Reviewed
0|u009a0:
ZooKeeper ZOOKEEPER-3229

ZOOKEEPER-3451 [TLS] add AES-256 ciphers to default cipher list

Sub-task Closed Minor Fixed Ilya Maykov Ilya Maykov Ilya Maykov 20/Dec/18 13:23   01/Jul/19 10:53 25/Jan/19 08:38 3.6.0, 3.5.5 3.6.0, 3.5.5     0 2 0 3600   Let's add AES-256 ciphers to the default cipher suite list, so clients that prefer using 256-bit symmetric keys can connect by default. 100% 100% 3600 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 6 days ago 0|u007c8:
ZooKeeper ZOOKEEPER-3228

[TLS] Fix key usage extension in test certs

Improvement Closed Minor Fixed Ilya Maykov Ilya Maykov Ilya Maykov 20/Dec/18 13:17   20/May/19 13:50 02/Jan/19 07:41 3.6.0, 3.5.5 3.6.0, 3.5.5     0 2   Key usage extension is wrong in test certs created by X509TestHelpers. This works with Java SSL stack because it allows sloppy certs, but breaks with Netty's OpenSSL stack. My Netty OpenSSL code is not ready for upstream yet, but fixing the test cert extensions is a prerequisite and can go in separately. ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 11 weeks, 1 day ago 0|u007c0:
ZooKeeper ZOOKEEPER-3227

Address Spotbugs: DM_DEFAULT_ENCODING issues

Bug Open Major Unresolved Unassigned Enrico Olivelli Enrico Olivelli 20/Dec/18 11:08   20/Dec/18 11:09   3.6.0   build   0 1   In a lot of places we are using the default encoding.

Spotbugs is not happy.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks ago 0|u0076g:
ZooKeeper ZOOKEEPER-3226

ZOOKEEPER-3021 Activate C Client with a profile, disabled by default

Sub-task Closed Major Fixed Norbert Kalmár Enrico Olivelli Enrico Olivelli 20/Dec/18 08:45   24/Sep/19 02:41 08/Jan/19 11:32 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build, c client   0 3 0 4800   100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 2 days ago 0|u006wo:
ZooKeeper ZOOKEEPER-3225

ZOOKEEPER-3021 Create code coverage analysis with maven build

Sub-task Closed Blocker Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 20/Dec/18 04:01   02/Apr/19 06:40 08/Jan/19 04:37 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build, scripts   0 1   Add clover and cobertura to maven build.

First, master only.
pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks ago 0|u006io:
ZooKeeper ZOOKEEPER-3224

ZOOKEEPER-3021 CI integration with maven

Sub-task Closed Blocker Fixed Enrico Olivelli Norbert Kalmár Norbert Kalmár 20/Dec/18 03:54   02/Apr/19 06:40 26/Feb/19 17:22 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build, scripts   0 2   Integrate maven build and audit with Jenkins.

First, master only.
pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks, 2 days ago Let's close it.
We have Travis + Pull Requests + Nightly build.
We will have to convert the remaining jobs when we drop ant (hopefully very soon, before 3.6.0 release)
0|u006i0:
ZooKeeper ZOOKEEPER-3223

ZOOKEEPER-3021 Configure Spotbugs

Sub-task Closed Blocker Fixed Enrico Olivelli Norbert Kalmár Norbert Kalmár 20/Dec/18 03:50   02/Apr/19 06:40 21/Jan/19 10:20 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build, scripts   0 3 0 22800   First, master only. 100% 100% 22800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks, 2 days ago 0|u006hc:
ZooKeeper ZOOKEEPER-3222

ZOOKEEPER-3170 Flaky: multiple intermittent segfaults in C++ tests

Sub-task Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 18/Dec/18 10:16   20/May/19 13:50 03/Jan/19 09:58 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5 c client   0 2 0 3000   There're multiple places in C++ tests where zookeeper client handle (zkhandle_t) doesn't get closed correctly causing intermittent segmentations faults.
e.g. In multi threaded tests IO thread remains open in these client which could when trying to log something in a log file which is already closed by the test.
Another catch is when the test is trying to validate an already closed client: the client struct cannot be accessed after the memory has been freed.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks, 6 days ago 0|u003cg:
ZooKeeper ZOOKEEPER-3221

WriteLock in recipes may get wrong child name as lock id

Bug Open Critical Unresolved Unassigned Huo Zhu Huo Zhu 18/Dec/18 08:55   15/Jan/19 21:41       recipes   0 3 86400 86400 0% zookeeper-recipes-1.0 recently i used WriteLock in my application, and get following Exception
{code:java}
Exception in thread "produce 1" java.lang.IllegalArgumentException: Path must start with / character
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:851)
at org.apache.zookeeper.recipes.lock.WriteLock$1.execute(WriteLock.java:118)
at org.apache.zookeeper.recipes.lock.WriteLock$1.execute(WriteLock.java:1)
at org.apache.zookeeper.recipes.lock.WriteLock.unlock(WriteLock.java:122)
{code}
the following function is called when tried to lock,  used an existed child node name as inner lock id, which may be conflict with another lock user, and at the same time the lock id is just the node name , no with prefix path,  causing{color:#FF0000} java.lang.IllegalArgumentException{color} in final delete operation. 
{code:java}
private void findPrefixInChildren(String prefix, ZooKeeper zookeeper, String dir) throws KeeperException, InterruptedException {
List<String> names = zookeeper.getChildren(dir, false);
for (String name : names) {
if (name.startsWith(prefix)) {
id = name;
if (LOG.isDebugEnabled()) {
LOG.debug("Found id created last time: " + id);
}
break;
}
}
if (id == null) {
id = zookeeper.create(dir + "/" + prefix, data, getAcl(), EPHEMERAL_SEQUENTIAL);
if (LOG.isDebugEnabled()) {
LOG.debug("Created id: " + id);
}
}

}
{code}
 
0% 0% 86400 86400 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks, 1 day ago 0|u00388:
ZooKeeper ZOOKEEPER-3220

The snapshot is not saved to disk and may cause data inconsistency.

Bug Open Critical Unresolved Unassigned Jiafu Jiang Jiafu Jiang 18/Dec/18 04:55   26/Dec/18 23:08   3.4.12, 3.4.13   server   0 4   We known that ZooKeeper server will call fsync to make sure that log data has been successfully saved to disk. But ZooKeeper server does not call fsync to make sure that a snapshot has been successfully saved, which may cause potential problems. Since a close to a file description does not make sure that data is written to disk, see [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.

 

If the snapshot is not successfully  saved to disk, it may lead to data inconsistency. Here is my example, which is also a real problem I have ever met.

1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the leader.

2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.

3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are saved to log files of both zk2(leader) and zk3(follower).

4. After zk1 restarted successfully, it found itself to be a follower, and it began to synchronize data with the leader. The leader sent a snapshot(records from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the method returned, the snapshot data was not saved to disk yet. In fact the snapshot file was created, but the size was 0.

5. zk1 finished the synchronization and began to accept new requests from the leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  saved to log file. With fsync zk1 could make sure log data was not lost.

6. zk1 restarted again. Since the snapshot's size was 0, it would not be used, therefore zk1 recovered using the log files. But the records from log(X+1) ~ logY were lost ! 

 

Sorry for my poor English.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 12 weeks ago 0|u002zc:
ZooKeeper ZOOKEEPER-3219

Fix flaky FileChangeWatcherTest

Improvement Resolved Minor Fixed Ilya Maykov Ilya Maykov Ilya Maykov 17/Dec/18 18:22   20/Dec/18 11:52 20/Dec/18 09:09 3.6.0, 3.5.5 3.6.0     0 2   A test I committed recently is flaky. Here is an example of failed test output from jenkins:

 
{code:java}
2018-12-17 21:52:53,824 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@77] - RUNNING TEST METHOD testCallbackErrorDoesNotCrashWatcherThread 2018-12-17 21:52:53,826 [myid:] - INFO [FileChangeWatcher:FileChangeWatcher$WatcherThread@193] - FileChangeWatcher thread started 2018-12-17 21:52:54,830 [myid:] - INFO [main:FileChangeWatcherTest@237] - Modifying file 2018-12-17 21:52:54,834 [myid:] - INFO [FileChangeWatcher:FileChangeWatcherTest@222] - Got an update: ENTRY_CREATE zk_test_5141143184635472109 2018-12-17 21:52:54,835 [myid:] - ERROR [FileChangeWatcher:FileChangeWatcher$WatcherThread@238] - Error from callback java.lang.RuntimeException: This error should not crash the watcher thread at org.apache.zookeeper.common.FileChangeWatcherTest.lambda$testCallbackErrorDoesNotCrashWatcherThread$4(FileChangeWatcherTest.java:226) at org.apache.zookeeper.common.FileChangeWatcher$WatcherThread.runLoop(FileChangeWatcher.java:236) at org.apache.zookeeper.common.FileChangeWatcher$WatcherThread.run(FileChangeWatcher.java:205) 2018-12-17 21:52:54,837 [myid:] - INFO [main:FileChangeWatcherTest@244] - Modifying file again 2018-12-17 21:52:54,837 [myid:] - INFO [FileChangeWatcher:FileChangeWatcherTest@222] - Got an update: ENTRY_MODIFY zk_test_5141143184635472109 2018-12-17 21:52:54,838 [myid:] - INFO [FileChangeWatcher:FileChangeWatcherTest@222] - Got an update: ENTRY_MODIFY zk_test_5141143184635472109 2018-12-17 21:52:54,839 [myid:] - INFO [FileChangeWatcher:FileChangeWatcher$WatcherThread@215] - FileChangeWatcher thread finished 2018-12-17 21:52:54,839 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testCallbackErrorDoesNotCrashWatcherThread java.lang.AssertionError: expected:<2> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.zookeeper.common.FileChangeWatcherTest.testCallbackErrorDoesNotCrashWatcherThread(FileChangeWatcherTest.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:53) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:38) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033) 2018-12-17 21:52:54,847 [myid:] - INFO [main:ZKTestCase$1@74] - FAILED testCallbackErrorDoesNotCrashWatcherThread java.lang.AssertionError: expected:<2> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.zookeeper.common.FileChangeWatcherTest.testCallbackErrorDoesNotCrashWatcherThread(FileChangeWatcherTest.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:53) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:38) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033) 2018-12-17 21:52:54,848 [myid:] - INFO [main:ZKTestCase$1@64] - FINISHED testCallbackErrorDoesNotCrashWatcherThread{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks ago 0|u002hk:
ZooKeeper ZOOKEEPER-3218

zk server reopened,the interval for observer connect to the new leader is too long,then session expired

Bug Resolved Major Fixed Unassigned yangoofy yangoofy 17/Dec/18 15:55   18/Jan/19 01:28 17/Jan/19 21:45   3.6.0     0 4 0 1200   win7 32bits

zookeeper 3.4.6、3.4.13
two participants、one observer,zkclient connect to observer。

Then,close the two participants,the zookeeper server cloesed

Ten seconds later,reopen the two participants,and leader selected

----------------------------------------------------------------------------

But the observer can't connect to the new leader immediately。Because in lookForLeader, the observer use blockingQueue(recvqueue)  to offer/poll notifications,when the recvqueue is empty,poll from recvqueue will be blocked,and timeout is 200ms,400ms,800ms....60s。

For example,09:59:59 observer poll notification,recvqueue was empty and timeout was 60s;10:00:00 two participants reopened and reselected;10:00:59 observer polled notification,connected to the new leader

But the maxSessionTimeout default to 40s。The session expired

-----------------------------------------------------------------------------

Please improve it:observer should connect to the new leader as soon as possible
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks, 6 days ago 0|u002cw:
ZooKeeper ZOOKEEPER-3217

owasp job flagging slf4j on trunk

Bug Closed Critical Fixed Enrico Olivelli Patrick D. Hunt Patrick D. Hunt 14/Dec/18 19:26   02/Apr/19 06:40 03/Jan/19 10:34   3.6.0, 3.5.5, 3.4.14     0 1 0 5400   https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-owasp/204/artifact/build/test/owasp/dependency-check-vulnerability.html

https://nvd.nist.gov/vuln/detail/CVE-2018-8088

We don't use EventData but should consider upgrading.
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks, 6 days ago 0|s01jrc:
ZooKeeper ZOOKEEPER-3216

Make init/sync limit tunable via JMX

Improvement Resolved Minor Fixed Jie Huang Jie Huang Jie Huang 13/Dec/18 18:08   08/Jul/19 17:32 20/Dec/18 09:14   3.6.0 jmx   0 4   Add beans for initLimit and syncLimit so they can be adjusted through JMX 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks ago 0|s01icw:
ZooKeeper ZOOKEEPER-3215

Handle Java 9/11 additions of covariant return types to java.nio.ByteBuffer methods

Bug Open Minor Unresolved Unassigned V V 13/Dec/18 15:54   17/Jan/19 12:35   3.4.13       0 3 0 6000   Java 9 introduces covariant return types which allows one to have different return types if return type in the overridden method is a sub type. Since Java 9, few functions return ByteBuffer, whereas the parent method return Buffer, resulting in causing issues for Java 8 and below since for them the method does not exist.

Steps To Reproduce:
1. Setup ZooKeeper Server with JDK11.
2. Setup ZooKeeper Client with JDK8.
3. Try connecting the client and server.

Results:
Cast ByteBuffer instances to Buffer before calling the method.

 

Notes:
There was a similar bug in the MongoDB community - [https://jira.mongodb.org/browse/JAVA-2559]

 

This is not a contribution.
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks, 1 day ago 0|s01i6w:
ZooKeeper ZOOKEEPER-3214

Flaky test: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter

Bug Resolved Minor Duplicate Unassigned maoling maoling 12/Dec/18 21:08   12/Dec/18 21:22 12/Dec/18 21:19     tests   0 3   more details in:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2901/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks ago 0|s01gx4:
ZooKeeper ZOOKEEPER-3213

Transaction has delete log bug actually it is not delete

Bug Open Blocker Unresolved Unassigned miaojianlong miaojianlong 12/Dec/18 02:00   16/Dec/18 09:57   3.4.10   leaderElection, server   0 3   Linux

Java 1.8

ZK 3.4.10

server1: 10.35.104.123

server2: 10.35.104.124

server3: 10.35.104.125
# first i found my spark(2.2.0) turn to standby (HA mode with zk) and i can not restart the service to restore the problem。
# Then I found that there are three nodes in the /spark/leader_election/ directory, which are 48, 93, and 94. These are temporary sequential nodes, and 48 should have been timed out. And I looked at the transaction log and did have a log of delete 48. But the actual data still exists.

The above phenomenon appears on the two nodes 10.35.104.123 and 10.35.104.125, and only 93 and 94 on 10.35.104.124.

Unable to export logs due to phenomenon in the company intranet
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 13 weeks, 4 days ago 0|s01fh4:
ZooKeeper ZOOKEEPER-3212

Fix website with adding doap.rdf back

Bug Resolved Major Fixed Tamas Penzes Tamas Penzes Tamas Penzes 11/Dec/18 04:15   11/Dec/18 05:15 11/Dec/18 05:06   3.6.0 other   0 1 0 1800   During the website migration the doap.rdf file has been forgotten. Must be put back to its place. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks, 2 days ago 0|s01dy0:
ZooKeeper ZOOKEEPER-3211

zookeeper standalone mode,found a high level bug in kernel of centos7.0 ,zookeeper Server's tcp/ip socket connections(default 60 ) are CLOSE_WAIT ,this lead to zk can't work for client any more

Bug Open Blocker Unresolved Unassigned yeshuangshuang yeshuangshuang 10/Dec/18 22:53   17/Sep/19 05:32   3.4.5 3.4.5 server   0 6 604800 604800 0% 1.zoo.cfg
server.1=127.0.0.1:2902:2903
2.kernel
kernel:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
JDK:
java version "1.7.0_181"
OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
zk: 3.4.5
1.config--zoo.cfg
server.1=127.0.0.1:2902:2903
2.kernel version
version:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
JDK:
java version "1.7.0_181"
OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
zk: 3.4.5
3.bug details:
Occasionally,But the recurrence probability is extremely high. At first, the read-write timeout takes about 6s, and after a few minutes, all connections (including long ones) will be CLOSE_WAIT state.

4.:Circumvention scheme: it is found that all connections become close_wait to restart the zookeeper server side actively
0% 0% 604800 604800 9223372036854775807 No Perforce job exists for this issue. 14 9223372036854775807
26 weeks, 2 days ago 0|s01do0:
ZooKeeper ZOOKEEPER-3210

Typo in zookeeperInternals doc

Bug Closed Trivial Fixed Unassigned Stanislav Knot Stanislav Knot 08/Dec/18 04:33   02/Apr/19 06:40 08/Jan/19 11:23   3.6.0, 3.5.5, 3.4.14     0 2 0 1800   100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks, 2 days ago 0|s01b4w:
ZooKeeper ZOOKEEPER-3209

New `getEphemerals` api to get all the ephemeral nodes created by the session

New Feature Resolved Major Fixed Dinesh Appavoo Dinesh Appavoo Dinesh Appavoo 07/Dec/18 18:25   16/Jan/19 20:58 16/Jan/19 08:21   3.6.0 other   0 2 0 9000   New API `getEphemerals()` to get all the ephemeral nodes created by the session by providing the prefix path.
* get the prefix path as a input parameter and return a list of string (ephemeral nodes)
* If the prefix path is `/` return all the ephemeral nodes created by the session
* Provide synchronous and asynchronous API's with same functionality
100% 100% 9000 0 features, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 9 weeks ago 0|s01ay8:
ZooKeeper ZOOKEEPER-3208

Remove the SSLTest.java.orig introduced in ZOOKEEPER-3032

Improvement Resolved Trivial Fixed Fangmin Lv Fangmin Lv Fangmin Lv 05/Dec/18 17:21   11/Dec/18 11:14 11/Dec/18 08:08   3.6.0     0 2 0 2400   This file was introduced by mistake when we're doing the MAVEN migrating in ZOOKEEPER-3032. 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks, 2 days ago 0|s017sg:
ZooKeeper ZOOKEEPER-3207

Watch related code being copied over twice when doing maven migration

Bug Open Minor Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 04/Dec/18 15:06   14/Dec/19 06:06     3.7.0     0 3 0 1800   File like WatchManager.java and WatchesPathReport.java exist in both org/apache/zookeeper/server and org/apache/zookeeper/server/watch folder, org/apache/zookeeper/server/watch is the right one, looks like we introduced the other one by mistake in ZOOKEEPER-3032. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks, 2 days ago 0|s01614:
ZooKeeper ZOOKEEPER-3206

Can't use Active Directory for Kerberos Authentication

Bug Open Major Unresolved Unassigned Stephane Maarek Stephane Maarek 03/Dec/18 09:23   03/Dec/18 09:24   3.4.13   kerberos   0 1   We're using Active Directory, and created service principals this way:

{code}
ktpass -princ ZOOKEEPER/host-1@TEST -mapuser zookeeper -mapOp add -Target TEST
ktpass -princ ZOOKEEPER/host-2@TEST -mapuser zookeeper -mapOp add -Target TEST
ktpass -princ ZOOKEEPER/host-3@TEST -mapuser zookeeper -mapOp add -Target TEST
{code}

Using this format, one is not able to do {code}kinit ZOOKEEPER/host-1@TEST{code}, but one is able to do {code}kinit zookeeper@TEST -S ZOOKEEPER/host-1@TEST{code} to obtain a service ticket.

In the Kafka project, it is fine for the JAAS file to have {code}principal="kafka@TEST"{code}, and automatically it seems it acquires the correct service ticket (I"m not sure how).

In zookeeper, things fail when a client tries to connect, due to this line:
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/util/SecurityUtils.java#L170

It'd be great for Zookeeper server to have the same kind of mechanism as Kafka for accepting client connections. This would allow us to have {code}principal="zookeeper@TEST"{code} in JAAS. Otherwise, maybe support a JAAS new option so we can explicitly name the service ?

FYI - trying {code}principal="zookeeper/host-1@TEST"{code} does not work as due to how Active Directory works, it complains that the credentials cannot be found in the database (as we try to authenticate using the service name, not the user name)

I'm attaching some documentation I find relevant: https://serverfault.com/questions/682374/client-not-found-in-kerberos-database-while-getting-initial/683058#683058
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 15 weeks, 3 days ago 0|s013wg:
ZooKeeper ZOOKEEPER-3205

Jute - o.a.jute.BinaryInputArchive Test cases

Test Resolved Minor Fixed Karthik K Karthik K Karthik K 26/Nov/18 10:55   02/Jan/19 11:40 02/Jan/19 09:10 3.5.4 3.6.0 jute   0 2 0 4800   o.a.j.  BinaryInput(Output)Archive handles a bunch of serialization / deserialization for various types. 

It would be good to have a bunch of test cases to assert them as currently they exist almost none. 
 
Attaching the patch herewith.

Also , added test cases for o.a.jute.Utils(Test) as well
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
1 year, 11 weeks, 1 day ago 0|s00vaw:
ZooKeeper ZOOKEEPER-3204

Reconfig tests are constantly failing on 3.5 after applying Java 11 fix

Test Closed Blocker Fixed Andor Molnar Andor Molnar Andor Molnar 26/Nov/18 05:37   20/May/19 13:50 14/Feb/19 11:39 3.5.5 3.5.5 tests   0 1 0 69600   The following Reconfig tests failing since we've committed to fix for Java 11. The fix which was intended to fix the problems with port binding issues on Java 11. Tests are not flaky, they've failed in all builds since #15.

https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java11/15/

100% 100% 69600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 3 days ago 0|s00us8:
ZooKeeper ZOOKEEPER-3203

Tracking and exposing the non voting followers in ZK

Improvement Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 25/Nov/18 19:38   07/Jan/19 12:52 07/Jan/19 08:56   3.6.0 server   0 2 0 3600   The current synced_followers metric reports all the forwarding followers, including non-voting ones.

We found it's useful to track how many servers are following leader in non-voting mode, so that we can identify issues like servers following but not issuing reconfig. This JIRA is going to add a separate metric to report the number of non-voting members.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks, 3 days ago 0|s00ug0:
ZooKeeper ZOOKEEPER-3202

ZOOKEEPER-3170 Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL

Sub-task Closed Major Fixed Michael K. Edwards Michael K. Edwards Michael K. Edwards 25/Nov/18 17:39   20/May/19 13:50 14/Jan/19 09:36   3.6.0, 3.5.5     0 2 0 2400   Encountered while running tests locally:
{noformat}
283208     [junit] 2018-11-25 22:35:31,581 [myid:2] - INFO  [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):ZooKeeperServer@164] - Created server with tick       Time 4000 minSessionTimeout 8000 maxSessionTimeout 80000 datadir /usr/src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2 snapdir /usr/src/zookeeper/build/te       st/tmp/test6909783885989201471.junit.dir/data/version-2

283209     [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO  [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):ZooKeeperServer@164] - Created server with tick       Time 4000 minSessionTimeout 8000 maxSessionTimeout 80000 datadir /usr/src/zookeeper/build/test/tmp/test9169467659375976724.junit.dir/data/version-2 snapdir /usr/src/zookeeper/build/te       st/tmp/test9169467659375976724.junit.dir/data/version-2

283210     [junit] 2018-11-25 22:35:31,581 [myid:0] - INFO  [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):ZooKeeperServer@164] - Created server with tick       Time 4000 minSessionTimeout 8000 maxSessionTimeout 80000 datadir /usr/src/zookeeper/build/test/tmp/test8933570428019756122.junit.dir/data/version-2 snapdir /usr/src/zookeeper/build/te       st/tmp/test8933570428019756122.junit.dir/data/version-2

283211     [junit] 2018-11-25 22:35:31,585 [myid:0] - INFO  [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):Follower@69] - FOLLOWING - LEADER ELECTION TOOK        - 275 MS

283212     [junit] 2018-11-25 22:35:31,588 [myid:2] - INFO  [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):Leader@457] - LEADING - LEADER ELECTION TOOK -       160 MS

283213     [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO  [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):Follower@69] - FOLLOWING - LEADER ELECTION TOOK        - 155 MS

283214     [junit] 2018-11-25 22:35:31,633 [myid:2] - INFO  [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):FileTxnSnapLog@372] - Snapshotting: 0x0 to /usr       /src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2/snapshot.0

283215     [junit] 2018-11-25 22:35:31,694 [myid:] - INFO  [main:FourLetterWordMain@87] - connecting to 127.0.0.1 11222

283216     [junit] 2018-11-25 22:35:31,695 [myid:0] - INFO  [New I/O worker #11:NettyServerCnxn@288] - Processing stat command from /127.0.0.1:60484

283217     [junit] 2018-11-25 22:35:31,699 [myid:] - INFO  [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testClientServerSSL

283218     [junit] java.lang.AssertionError: waiting for server 0 being up

283219     [junit]     at org.junit.Assert.fail(Assert.java:88)

283220     [junit]     at org.junit.Assert.assertTrue(Assert.java:41)

283221     [junit]     at org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL(ClientSSLTest.java:98){noformat}
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 9 weeks, 3 days ago 0|s00ueg:
ZooKeeper ZOOKEEPER-3201

ZOOKEEPER-3170 Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart

Sub-task Open Major Unresolved Unassigned Michael K. Edwards Michael K. Edwards 25/Nov/18 17:34   26/Nov/18 01:46           0 1   Encountered when running tests locally:
{noformat}
64429     [junit] 2018-11-25 22:28:12,729 [myid:127.0.0.1:27389] - INFO  [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27389. Will not attempt to authenticate using SASL (unknown error)

64430     [junit] 2018-11-25 22:28:12,730 [myid:127.0.0.1:27389] - INFO  [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:47668, server: localhost/127.0.0.1:27389

64431     [junit] 2018-11-25 22:28:12,734 [myid:] - INFO  [NIOWorkerThread-1:Learner@117] - Revalidating client: 0x10000a9cccf0000

64432     [junit] 2018-11-25 22:28:12,743 [myid:127.0.0.1:27389] - INFO  [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1390] - Session establishment complete on server localhost/12      7.0.0.1:27389, sessionid = 0x10000a9cccf0000, negotiated timeout = 30000

64433     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27392. Will not attempt to authenticate using SASL (unknown error)

64434     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:52160, server: localhost/127.0.0.1:27392

64435     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27395. Will not attempt to authenticate using SASL (unknown error)

64436     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:47256, server: localhost/127.0.0.1:27395

64437     [junit] 2018-11-25 22:28:13,017 [myid:] - INFO  [NIOWorkerThread-4:ZooKeeperServer@1030] - Refusing session request for client /127.0.0.1:47256 as it has seen zxid 0x300000000 our       last zxid is 0x2fffffffe client must try another server

64438     [junit] 2018-11-25 22:28:13,018 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read additional data from server sessionid       0x30000a9ccd20000, likely server has closed socket, closing socket connection and attempting reconnect

64439     [junit] 2018-11-25 22:28:13,023 [myid:127.0.0.1:27392] - INFO  [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1390] - Session establishment complete on server localhost/12      7.0.0.1:27392, sessionid = 0x20000a9d0940000, negotiated timeout = 30000

64440     [junit] 2018-11-25 22:28:13,119 [myid:] - INFO  [main:FourLetterWordMain@87] - connecting to 127.0.0.1 27395

64441     [junit] 2018-11-25 22:28:13,120 [myid:] - INFO  [NIOWorkerThread-1:NIOServerCnxn@518] - Processing stat command from /127.0.0.1:47258

64442     [junit] 2018-11-25 22:28:13,121 [myid:] - INFO  [NIOWorkerThread-1:StatCommand@53] - Stat command output

64443     [junit] 2018-11-25 22:28:14,134 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27395. Will not attempt to authenticate using SASL (unknown error)

64444     [junit] 2018-11-25 22:28:14,135 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:47312, server: localhost/127.0.0.1:27395

64445     [junit] 2018-11-25 22:28:14,135 [myid:] - INFO  [NIOWorkerThread-2:ZooKeeperServer@1030] - Refusing session request for client /127.0.0.1:47312 as it has seen zxid 0x300000000 our       last zxid is 0x2fffffffe client must try another server

64446     [junit] 2018-11-25 22:28:14,137 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read additional data from server sessionid       0x30000a9ccd20000, likely server has closed socket, closing socket connection and attempting reconnect

64447     [junit] 2018-11-25 22:28:14,240 [myid:] - INFO  [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testRolloverThenLeaderRestart

64448     [junit] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foofoofoo-connected

64449     [junit]     at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)

64450     [junit]     at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)

64451     [junit]     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1942)

64452     [junit]     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1970)

64453     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.checkClientConnected(ZxidRolloverTest.java:119)

64454     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.checkClientsConnected(ZxidRolloverTest.java:90)

64455     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.start(ZxidRolloverTest.java:165)

64456     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart(ZxidRolloverTest.java:382){noformat}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 3 days ago 0|s00uds:
ZooKeeper ZOOKEEPER-3200

ZOOKEEPER-3170 Flaky test: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testInconsistentDueToNewLeaderOrder

Sub-task Open Major Unresolved Unassigned Michael K. Edwards Michael K. Edwards 25/Nov/18 13:56   25/Nov/18 13:56           0 1   https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1206/

I've seen this locally as well, in a branch where ZOOKEEPER-2778, ZOOKEEPER-1818, and ZOOKEEPER-2488 have all been addressed.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 4 days ago 0|s00ua8:
ZooKeeper ZOOKEEPER-3199

Unable to produce verbose logs of Zookeeper

Bug Resolved Major Fixed Unassigned Ankit Kothana Ankit Kothana 24/Nov/18 02:35   26/Nov/18 04:22 26/Nov/18 04:22         0 1   We are using Zookeeper in our system along with Apache Kafka. However, Zookeeper is not producing any relevant logs (even with lower log levels specified in log4j.properties) in the log file that could help us in identifying what is currently going on in ZK or Kafka cluster. 

Please let us know how to retrieve proper logs from ZK cluster.

Version of ZK : 3.4.13
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
1 year, 16 weeks, 3 days ago 0|s00tn4:
ZooKeeper ZOOKEEPER-3198

Handle port-binding failures in a systematic and documented fashion

Improvement Open Major Unresolved Unassigned Michael K. Edwards Michael K. Edwards 22/Nov/18 16:57   05/Feb/20 07:16   3.5.3, 3.6.0, 3.4.13 3.7.0, 3.5.8     0 1   Many test failures appear to result from bind failures due to port conflicts. This can arise in normal use as well. Presently the code swallows the exception (with an error log) at a low level. It would probably be useful to throw the exception far enough up the stack to trigger retry with a new port (in tests) or a high-level (perhaps even fatal) error message (in normal use). 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 4 days ago 0|s00siw:
ZooKeeper ZOOKEEPER-3197

Improve documentation in ZooKeeperServer.superSecret

Task Closed Trivial Fixed Unassigned Colm O hEigeartaigh Colm O hEigeartaigh 22/Nov/18 11:18   20/May/19 13:50 07/Jan/19 09:34   3.6.0, 3.5.5     0 4 0 3000   A security scan flagged the use of a hard-coded secret (ZooKeeperServer.superSecret) in conjunction with a java Random instance to generate a password:

byte[] generatePasswd(long id)

{             Random r = new Random(id ^ superSecret);             byte p[] = new byte[16];             r.nextBytes(p);             return p;     }

superSecret has the following javadoc:

 /**
   * This is the secret that we use to generate passwords, for the moment it
   * is more of a sanity check.
   */

It is unclear from this comment and looking at the code why it is not a security risk. It would be good to update the javadoc along the lines of "Using a hard-coded secret with Random to generate a password is not a security risk because the resulting passwords are used for X, Y, Z and not for authentication etc" or something would be very helpful for anyone else looking at the code.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 10 weeks, 3 days ago 0|s00sa0:
ZooKeeper ZOOKEEPER-3196

Maintain the configuration will be used by server stabilizer.It can be overridden based on the server type and the server system internals.

Improvement Open Minor Unresolved Unassigned Venkateswarlu Tumati Venkateswarlu Tumati 21/Nov/18 09:23   14/Dec/19 06:06   3.6.0 3.7.0 server   0 1 0 600   Maintain the configuration will be used by server stabilizer. It can be overridden based on the server type and the server system internals.

 

-Avoid calculating the globalOutstandingLimit for every request as it is not going to be changed for every request.

-we are reading globalOutstandingLimit from the system property and parsing the value at every call of shouldThrottle. So it can be taken from config. It will act as the cache.
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks, 1 day ago 0|s00qmw:
ZooKeeper ZOOKEEPER-3195

TLS - disable client-initiated renegotiation

Improvement Closed Major Fixed Ilya Maykov Ilya Maykov Ilya Maykov 20/Nov/18 18:28   20/May/19 13:50 14/Jan/19 13:40 3.6.0, 3.5.5 3.6.0, 3.5.5     0 3 0 11400   Client-initiated TLS renegotiation is not secure and exposes the connection to MITM attacks. Unfortunately, Java's TLS implementation allows it by default. Thankfully, it is easy to disable. 100% 100% 11400 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 9 weeks, 3 days ago 0|s00pu8:
ZooKeeper ZOOKEEPER-3194

ZOOKEEPER-3451 Quorum TLS - fix copy/paste bug in ZKTrustManager

Sub-task Closed Minor Fixed Ilya Maykov Ilya Maykov Ilya Maykov 20/Nov/18 18:21   01/Jul/19 10:53 27/Nov/18 15:07 3.6.0, 3.5.5 3.6.0, 3.5.5 security   0 3 0 4800   There is an obvious copy/paste bug in ZKTrustManager: ZKTrustManager.checkClientTrusted() is calling x509ExtendedTrustManager.checkServerTrusted(). It should call x509ExtendedTrustManager.checkClientTrusted() instead. 100% 100% 4800 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 2 days ago 0|s00pts:
ZooKeeper ZOOKEEPER-3193

ZOOKEEPER-3170 Flaky: org.apache.zookeeper.test.SaslAuthFailNotifyTest

Sub-task Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 19/Nov/18 09:42   20/May/19 13:50 27/Nov/18 03:56 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5 tests   0 3 0 2400   This test doesn't fail often on Apache Jenkins, but seems like quite flaky in our in-house testing environment. It's having a race in waiting for the AuthFailed event that could happen before client creation succeeds, causing the wait operation to hand infinitely (notify occurred before the wait() call). Using a CountDownLatch would be better for the same purpose. 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 2 days ago 0|s00nnk:
ZooKeeper ZOOKEEPER-3192

zoo_multi/zoo_amulti crash

Bug Open Blocker Unresolved Unassigned Jian Wang Jian Wang 15/Nov/18 20:16   15/Nov/18 20:16   3.4.13   c client   0 1   VS2013 Building (/MD) In the zoo_amulti function (zookeeper.c) , it seems an initialization problem.
{code:java}
struct RequestHeader h = { STRUCT_INITIALIZER(xid, get_xid()), STRUCT_INITIALIZER(type, ZOO_MULTI_OP) };
struct MultiHeader mh = { STRUCT_INITIALIZER(type, -1), STRUCT_INITIALIZER(done, 1), STRUCT_INITIALIZER(err, -1) };
struct oarchive *oa = create_buffer_oarchive();
completion_head_t clist = { 0 };
{code}
variable "clist" 's member cond and lock are not initialized correctly. They should be initialized by pthread_cond_init and pthread_mutex_init. Otherwise zoo_amulti would crash when queue_completion was called witch calls pthread_cond_boardcast using clist->cond
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Important
1 year, 17 weeks, 6 days ago 0|s00k9c:
ZooKeeper ZOOKEEPER-3191

Code clean up

Improvement Open Minor Unresolved Unassigned Artem Chernatsky Artem Chernatsky 15/Nov/18 10:24   18/Feb/19 22:34           0 4   Working on some feature for Zookeeper I've found strange and redundant code in several places, like:
* 1. [if true set true|https://github.com/apache/zookeeper/blob/a859410aea35dfef5fa54c99fb8a5bfc81f1a46b/src/java/main/org/apache/zookeeper/server/quorum/Leader.java#L388-L398]
* 2 .[redundant code|https://github.com/apache/zookeeper/blob/a859410aea35dfef5fa54c99fb8a5bfc81f1a46b/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L124-L126]
* 3. [redundant code|https://github.com/apache/zookeeper/blob/a859410aea35dfef5fa54c99fb8a5bfc81f1a46b/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L462]

It makes difficult working with code. So maybe it should be cleaned up?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks, 2 days ago 0|s00jco:
ZooKeeper ZOOKEEPER-3190

Spell check on the Zookeeper server files

Improvement Resolved Minor Fixed Dinesh Appavoo Dinesh Appavoo Dinesh Appavoo 13/Nov/18 20:00   29/Mar/19 16:48 17/Nov/18 12:37   3.6.0 documentation, other   0 4 0 2400   This JIRA is to do spell check on the zookeeper server files [ zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server ].


100% 100% 2400 0 newbie, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks, 5 days ago 0|s00glc:
ZooKeeper ZOOKEEPER-3189

Support new configuration syntax for resilient network feature

Bug Open Major Unresolved Unassigned Ted Dunning Ted Dunning 13/Nov/18 02:31   13/Nov/18 02:32           0 1    

There are simultaneous efforts ongoing to support network resilience (3188, blocking this issue) and a new configuration syntax (3166, also blocking this issue) being worked on simultaneously.

This issue captures the fact that the new syntax will need to be supported by the network resilience code, but both features are pre-requisites for that support.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 18 weeks, 2 days ago 0|s00f7k:
ZooKeeper ZOOKEEPER-3188

Improve resilience to network

Improvement Resolved Major Fixed Mate Szalay-Beko Ted Dunning Ted Dunning 12/Nov/18 11:44   29/Nov/19 08:50 29/Nov/19 08:49   3.6.0     0 4 0 43800   We propose to add network level resiliency to Zookeeper. The ideas that we have on the topic have been discussed on the mailing list and via a specification document that is located at [https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing]

That document is copied to this issue which is being created to report the results of experimental implementations.
h1. Zookeeper Network Resilience
h2. Background

Zookeeper is designed to help in building distributed systems. It provides a variety of operations for doing this and all of these operations have rather strict guarantees on semantics. Zookeeper itself is a distributed system made up of cluster containing a leader and a number of followers. The leader is designated in a process known as leader election in which a majority of all nodes in the cluster must agree on a leader. All subsequent operations are initiated by the leader and completed when a majority of nodes have confirmed the operation. Whenever an operation cannot be confirmed by a majority or whenever the leader goes missing for a time, a new leader election is conducted and normal operations proceed once a new leader is confirmed.

 

The details of this are not important relative to this discussion. What is important is that the semantics of the operations conducted by a Zookeeper cluster and the semantics of how client processes communicate with the cluster depend only on the basic fact that messages sent over TCP connections will never appear out of order or missing. Central to the design of ZK is that a server to server network connection is used as long as it works to use it and a new connection is made when it appears that the old connection isn't working.

 

As currently implemented, however, each member of a Zookeeper cluster can have only a single address as viewed from some other process. This means, absent network link bonding, that the loss of a single switch or a few network connections could completely stop the operations of a the Zookeeper cluster. It is the goal of this work to address this issue by allowing each server to listen on multiple network interfaces and to connect to other servers any of several addresses. The effect will be to allow servers to communicate over redundant network paths to improve resiliency to network failures without changing any core algorithms.
h2. Proposed Change

Interestingly, the correct operations of a Zookeeper cluster do not depend on _how_ a TCP connection was made. There is no reason at all not to advertise multiple addresses for members of a Zookeeper cluster.

 

Connections between members of a Zookeeper cluster and between a client and a cluster member are established by referencing a configuration file (for cluster members) that specifies the address of all of the nodes in a cluster or by using a connection string containing possible addresses of Zookeeper cluster members. As soon as a connection is made, any desired authentication or encryption layers are added and the connection is handed off to the client communications layer or the server to server logic.

This means that the only thing that actually needs to change to allow Zookeeper servers to be accessible on multiple networks is a change in the server configuration file format to allow the multiple addresses to be specified and to update the code that establishes the TCP connection to make use of these multiple addresses. No code changes are actually needed on the client since we can simply supply all possible server addresses. The client already has logic for selecting a server address at random and it doesn’t really matter if these addresses represent synonyms for the same server. All that matters is that _some_ connection to a server is established.
h2. Configuration File Syntax Change

The current Zookeeper syntax looks like this:

 

tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

 

The only lines that matter for this discussion are the last three. These specify the addresses for each of the servers that are part of the Zookeeper cluster as well as the port numbers used for the servers to talk to each other.

 

I propose that the current syntax of these lines be augmented to allow a comma delimited list of addresses. For the current example, we might have this:

 

server.1=zoo1-net1:2888:3888,zoo1-net2:2888:3888
server.2=zoo2-net1:2888:3888,zoo2-net2:2888:3888
server.3=zoo3-net1:2888:3888

 

The first two servers are available via two different addresses, presumably on separate networks while the third server only has a single address. In practice, we would probably specify multiple addresses for all the servers, but that isn’t necessary for this proposal. There is work ongoing to improve and generalize the syntax for configuring Zookeeper clusters. As that work progresses, it will be necessary to figure out appropriate extensions to allow multiple addresses in the new and improved syntax. Nothing blocks the current proposal from being implemented in current form and adapted later for the new syntax.

 

When a server tries to connect to another server, it would simply shuffle the available addresses at random and try to connect using successive addresses until a connection succeeds or all addresses have been tried.

 

The complete syntax for server lines in a Zookeeper configuration file in BNF is

 

<server-line> ::= "server."<integer> "=" <address-spec>

<address-spec> ::= <server-address>[<client-address>]

<server-address> ::= <address>:<port1>:<port2>[:<role>]

<client-address> ::= [;[<client address>:]<client port>

 

After this change, the syntax would look like this:

 

<server-line> ::= "server."<integer> "=" <address-list>

<address-list> ::= <address-spec>[,<address-list>]

<address-spec> ::= <server-address>[<client-address>]

<server-address> ::= <address>:<port1>:<port2>[:<role>]

<client-address> ::= [;[<client address>:]<client port>

 
h2. Dynamic Reconfiguration

From version 3.5, Zookeeper has the ability to change the configuration of the cluster dynamically. This can involve the atomic change of any of the configuration parameters that are dynamically configurable. These include, notably for the purposes here, the addresses of the servers in the cluster. In order to simplify this, the configuration file post 3.5 is split into static configuration that cannot be changed on the fly and dynamic configuration that can be changed. When a new configuration is committed by the cluster, the dynamic configuration file is simply over-written and used.

 

This means that extending the configuration file syntax to support multiple addresses is sufficient to support dynamic reconfiguration.
h2. Client Connections

When client connections are initially made, the client library is given a list of servers to contact. Servers are selected at random until a connection is made or the patience of the library implementers is exhausted. This requires no changes to support multiple network links per server except insofar that servers with more network connections will wind up with more client connections unless some action is taken. What will be done is to find the server with the most addresses and add duplicates of some address for every other server until every server is mentioned the same number of times. For cases where all servers have identical numbers of network connections, this will cause no change. It is expected that this will only arise in normal situations as a transient condition while a cluster is being reconfigured or if some servers are added to a cluster temporarily during maintenance operations.

 

More interesting is the fact that when a connection is made to a Zookeeper cluster, the server responds with a list of the servers in the cluster. We will need to arrange that the list contains all available address in the Zookeeper cluster, but will not need to make any other changes. As mentioned before, some addresses might be duplicated to make sure that all servers have equal probability of being selected by a server.
100% 100% 43800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
15 weeks, 6 days ago 0|s00egg:
ZooKeeper ZOOKEEPER-3187

Apache zookeeper 3.5.3-beta security vulnerabilities CVE-2018-8012

Improvement Open Major Unresolved Unassigned Mujassim Sheikh Mujassim Sheikh 12/Nov/18 08:52   12/Nov/18 12:11           0 2   Apache zookeeper 3.5.3-beta  has security vulnerabilities vulnerable CVE-2018-8012.  9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 18 weeks, 3 days ago 0|s00e54:
ZooKeeper ZOOKEEPER-3186

bug in barrier example code

Bug Open Major Unresolved Unassigned cheng pan cheng pan 09/Nov/18 01:27   21/Nov/18 21:06       documentation   0 3   the code given in the documentation
{code:java}
while (true) {
synchronized (mutex) {
List<String> list = zk.getChildren(root, true);
if (list.size() < size) {
mutex.wait();
} else {
return true;
}
}
}
{code}
When some nodes are not ready, the code calls mutex.wait() and waits for the watcher event to call mutex.notify() to wake it up. The problem is, we can't guarantee that mutex.notify() will definitely happen after mutex.wait(), which will cause client is stuck.
The solution might be CountDownLatch?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks ago 0|s00b4o:
ZooKeeper ZOOKEEPER-3185

After the skipACL flag is opened, the acl of the created node becomes 'auth,'. This will cause the node to be unreadable after closing the skipACL.

Bug Open Major Unresolved maoling ZHU CHONG ZHU CHONG 01/Nov/18 04:45   10/Mar/20 06:45   3.4.12   security, server   0 1   1、

Modify configuration file zoo.cfg,set  skipACL=yes.

2、

create  /test  null digest:test:ooOS6Ac+VQuWIVe96Ts+Phqg0LU=:cdrwa 

123 is password ,ooOS6Ac+VQuWIVe96Ts+Phqg0LU= is ciphertext

3、

getAcl /test
'auth,'
: cdrwa

4、

Modify configuration file zoo.cfg,set  skipACL=no.

5、

addauth  digest test:123

6、

get /test

Authentication is not valid : /test
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 20 weeks ago 0|s000fc:
ZooKeeper ZOOKEEPER-3184

ZOOKEEPER-925 Use the same method to generate website as documentation

Sub-task Resolved Major Fixed Tamas Penzes Tamas Penzes Tamas Penzes 29/Oct/18 08:13   07/Dec/18 06:33 07/Dec/18 06:33         0 1 0 13200   We should use the same method to generate website as we do the documentation.

This way we would get rid of Jekyll and would not need to install anything on the build machines.
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 20 weeks, 3 days ago 0|i3zr1j:
ZooKeeper ZOOKEEPER-3183

Interrupting or notifying the WatcherCleaner thread during shutdown if it is waiting for dead watchers get certain number(watcherCleanThreshold) and also stop adding incoming deadWatcher to deadWatchersList when shutdown is initiated.

Improvement Resolved Minor Fixed Unassigned Venkateswarlu Tumati Venkateswarlu Tumati 28/Oct/18 01:45   28/Nov/18 21:36 28/Nov/18 17:44   3.6.0 server   0 2 0 26400   Interrupting or  notifying the WatcherCleaner  thread during shutdown if it is waiting for dead watchers get certain number(watcherCleanThreshold) and also stop adding incoming deadWatcher to deadWatchersList when shutdown is initiated. 100% 100% 26400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks ago 0|i3zq47:
ZooKeeper ZOOKEEPER-3182

Race condition when follower syncing with leader and starting to serve requests

Bug Open Critical Unresolved Unassigned Andor Molnar Andor Molnar 25/Oct/18 09:41   04/Oct/19 10:55   3.6.0   server   0 6   This issue is probably introduced by ZOOKEEPER-2024 where 2 seperate queues have been implemented in CommitProcessor to improve performance. [~abrahamfine] 's analysis is accurate on GitHub: https://github.com/apache/zookeeper/pull/300

He was trying to introduce synchronization between Learner.syncWithLeader() and CommitProcessor to wait for in-flight requests to be committed before accepting client requests.

In the affected unit test ({{testNodeDataChanged}}) there's a race between reconnecting client's setWatches request and updates coming from the leader according to the following logs:

{noformat}
2018-10-25 13:59:58,556 [myid:] - DEBUG [FollowerRequestProcessor:1:CommitProcessor@424] - Processing request:: sessionid:0x10005d8fc4d0000 type:setWatches cxid:0x3 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2018-10-25 13:59:58,556 [myid:] - DEBUG [CommitProcWorkThread-1:FinalRequestProcessor@91] - Processing request:: sessionid:0x10005d8fc4d0000 type:setWatches cxid:0x3 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
...
2018-10-25 13:59:58,557 [myid:] - DEBUG [CommitProcWorkThread-1:FinalRequestProcessor@91] - Processing request:: sessionid:0x20005d8f8a40000 type:delete cxid:0x1 zxid:0x100000004 txntype:2 reqpath:n/a
...
2018-10-25 13:59:58,561 [myid:] - DEBUG [CommitProcWorkThread-1:FinalRequestProcessor@91] - Processing request:: sessionid:0x20005d8f8a40000 type:create cxid:0x2 zxid:0x100000005 txntype:1 reqpath:n/a
2018-10-25 13:59:58,561 [myid:127.0.0.1:11231] - DEBUG [main-SendThread(127.0.0.1:11231):ClientCnxn$SendThread@864] - Got WatchedEvent state:SyncConnected type:NodeDeleted path:/test-changed for sessionid 0x10005d8fc4d0000
{noformat}

{{setWatches}} request is processed before {{delete}} and {{create}}, hence the client receives NodeDeleted event.

In the working scenario it looks like:

{noformat}
2018-10-25 14:04:55,247 [myid:] - DEBUG [CommitProcWorkThread-1:FinalRequestProcessor@91] - Processing request:: sessionid:0x20005dd88110000 type:delete cxid:
0x1 zxid:0x100000004 txntype:2 reqpath:n/a
2018-10-25 14:04:55,249 [myid:] - DEBUG [CommitProcWorkThread-1:FinalRequestProcessor@91] - Processing request:: sessionid:0x20005dd88110000 type:create cxid:
0x2 zxid:0x100000005 txntype:1 reqpath:n/a
...
2018-10-25 14:04:56,314 [myid:] - DEBUG [FollowerRequestProcessor:1:CommitProcessor@424] - Processing request:: sessionid:0x10005dd88110000 type:setWatches cxid:0x3 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2018-10-25 14:04:56,315 [myid:] - DEBUG [CommitProcWorkThread-1:FinalRequestProcessor@91] - Processing request:: sessionid:0x10005dd88110000 type:setWatches cxid:0x3 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
...
2018-10-25 14:04:56,316 [myid:127.0.0.1:11231] - DEBUG [main-SendThread(127.0.0.1:11231):ClientCnxn$SendThread@842] - Got notification sessionid:0x10005dd88110000
2018-10-25 14:04:56,316 [myid:127.0.0.1:11231] - DEBUG [main-SendThread(127.0.0.1:11231):ClientCnxn$SendThread@864] - Got WatchedEvent state:SyncConnected type:NodeDataChanged path:/test-changed for sessionid 0x10005dd88110000
{noformat}

{{delete}} and {{create}} requests happen way before {{setWatches}} comes in (even before the client connection is established) and client receives NodeDataChanged event only.

Abe's approach unfortunately raises the following concerns:
- modifies CommitProcessor's code which might affect performance and correctness ([~shralex] raised on ZOOKEEPER-2807),
- we experienced deadlocks while testing the patch: https://github.com/apache/zookeeper/pull/300

As a consequence I raised this Jira to capture the experiences and to put the unit test on Ignore list, because currently I'm not sure about whether this is a real issue or a non-backward compatible change in 3.6 with the gain of a huge performance improvement.

Either way I don't want this flaky test to influence contributions, so I'll mark as Ignored on trunk until the issue is resolved.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks ago 0|i3zmtr:
ZooKeeper ZOOKEEPER-3181

ZOOKEEPER-2355 broke Curator TestingQuorumPeerMain

Bug Resolved Major Not A Problem Unassigned Akira Ajisaka Akira Ajisaka 24/Oct/18 00:00   28/Nov/18 01:27 28/Nov/18 01:27 3.5.3, 3.4.11       0 3 0 11400   ZOOKEEPER-2355 added a getQuorumPeer method to QuorumPeerMain [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java#L194]. TestingQuorumPeerMain has an identically named method, which is now unintentionally overridding the one in the base class.

This is fixed by CURATOR-409, however, I'd like this to be fixed in ZooKeeper as well.
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 1 day ago 0|i3zkf3:
ZooKeeper ZOOKEEPER-3180

Add response cache to improve the throughput of read heavy traffic

Improvement Resolved Minor Fixed Brian Nixon Fangmin Lv Fangmin Lv 19/Oct/18 17:54   26/Sep/19 02:26 14/Jan/19 13:39   3.6.0 server   1 7 0 17400   On read heavy use case with large response data size, the serialization of response takes time and added overhead to the GC.

Add response cache helps improving the throughput we can support, which also reduces the latency in general.

This Jira is going to implement a LRU cache for the response, which shows some performance gain on some of our production ensembles.
100% 100% 17400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks, 6 days ago 0|i3zfp3:
ZooKeeper ZOOKEEPER-3179

Add snapshot compression to reduce the disk IO

Improvement Resolved Major Fixed Yisong Yue Fangmin Lv Fangmin Lv 19/Oct/18 17:45   02/May/19 04:46 10/Apr/19 07:19   3.6.0     1 6 0 16800   When the snapshot becomes larger, the periodically snapshot after certain number of txns will be more expensive. Which will in turn affect the maximum throughput we can support within SLA, because of the disk contention between snapshot and txn when they're on the same drive.
 
With compression like zstd/snappy/gzip, the actual snapshot size could be much smaller, the compress ratio depends on the actual data. It might make the recovery time (loading from disk) faster in some cases, but will take longer sometimes because of the extra time used to compress/decompress.
 
Based on the production traffic, the performance various with different compress method as well, that's why we provided different implementations, we can select different compress method for different use cases.
 
100% 100% 16800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
49 weeks, 1 day ago 0|i3zfo7:
ZooKeeper ZOOKEEPER-3178

Remove PrepRequestProcessor from RO ZooKeeperServer to avoid txns being created in RO mode

Bug Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 16/Oct/18 11:24   14/Dec/19 06:06     3.7.0 server   0 1   For some reason, the ReadOnlyZooKeeperServer was implemented with PrepRequestProcessor, which is meaningless and error-prone, since all it does is preparing txn, and we shouldn't allow txns being created on non-leader server.
 
This will cause dangling global session on RO observer, because the createSession is being generated, and the code thought it's global session and added to Snapshot.
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks, 2 days ago 0|i3z9dj:
ZooKeeper ZOOKEEPER-3177

Refactor request throttle logic in NIO and Netty to keep the same behavior and make the code easier to maintain

Improvement Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 16/Oct/18 11:08   07/Dec/18 14:52 17/Nov/18 12:34   3.6.0 server   0 4 0 12600   There is shouldThrottle logic in zkServer, we should use it in NIO as well, refactor the code to make it cleaner and easier to maintain in the future. 100% 100% 12600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks, 1 day ago 0|i3z9db:
ZooKeeper ZOOKEEPER-3176

ZOOKEEPER-3451 Quorum TLS - add SSL config options

Sub-task Closed Major Fixed Ilya Maykov Ilya Maykov Ilya Maykov 15/Oct/18 18:48   01/Jul/19 10:54 24/Jan/19 05:53 3.6.0, 3.5.5 3.6.0, 3.5.5     0 3 0 18000   Some parameters of Quorum TLS connections are not currently configurable. Let's add configuration properties for them with reasonable defaults. In particular, these are:
* enabled protocols
* client auth behavior (want / need / none)
* a timeout for TLS handshake detection in a UnifiedServerSocket
100% 100% 18000 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 8 weeks ago 0|i3z85z:
ZooKeeper ZOOKEEPER-3175

ZOOKEEPER-3451 Quorum TLS - test improvements

Sub-task Closed Major Fixed Ilya Maykov Ilya Maykov Ilya Maykov 15/Oct/18 18:45   01/Jul/19 10:53 07/Nov/18 11:09 3.6.0, 3.5.5 3.6.0, 3.5.5     0 1   To simplify testing of Quorum TLS features, let's encapsulate the functionality of creating test trust/key stores in some helper classes so it can be easily shared between different unit tests. ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 19 weeks, 1 day ago 0|i3z85j:
ZooKeeper ZOOKEEPER-3174

ZOOKEEPER-3451 Quorum TLS - support reloading trust/key store

Sub-task Closed Major Fixed Ilya Maykov Ilya Maykov Ilya Maykov 15/Oct/18 18:43   01/Jul/19 10:53 19/Dec/18 07:49 3.6.0, 3.5.5 3.6.0, 3.5.5     0 2 0 45600   The Quorum TLS feature recently added in ZOOKEEPER-236 doesn't support reloading a trust/key store from disk when it changes. In an environment where short-lived certificates are used and are refreshed by some background daemon / cron job, this is a problem. Let's support reloading a trust/key store from disk when the file on disk changes. 100% 100% 45600 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks, 1 day ago 0|i3z853:
ZooKeeper ZOOKEEPER-3173

ZOOKEEPER-3451 Quorum TLS - support PEM trust/key stores

Sub-task Closed Major Fixed Ilya Maykov Ilya Maykov Ilya Maykov 15/Oct/18 18:40   01/Jul/19 10:53 06/Nov/18 20:30 3.6.0, 3.5.5 3.6.0, 3.5.5     0 3 0 52800   ZOOKEEPER-236 is landed so there is some TLS support in Zookeeper now, but only JKS trust stores are supported. JKS is not really used by non-Java software, where PKCS12 and PEM are more standard. Let's add support for PEM trust / key stores to make Quorum TLS easier to use. 100% 100% 52800 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 19 weeks, 1 day ago 0|i3z84v:
ZooKeeper ZOOKEEPER-3172

ZOOKEEPER-3451 Quorum TLS - fix port unification to allow rolling upgrades

Sub-task Closed Major Fixed Ilya Maykov Ilya Maykov Ilya Maykov 15/Oct/18 18:33   01/Jul/19 10:53 27/Nov/18 11:57 3.6.0, 3.5.5 3.6.0, 3.5.5 security, server   0 3 0 37800   ZOOKEEPER-236 was committed with port unification support disabled, because of various issues with the implementation. These issues should be fixed so port unification can be enabled again. Port unification is necessary to upgrade an ensemble from plaintext to TLS quorum connections without downtime. 100% 100% 37800 0 pull-request-available, ssl-tls 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 2 days ago 0|i3z84n:
ZooKeeper ZOOKEEPER-3171

ZOOKEEPER-3021 Create pom.xml for recipes and contrib

Sub-task Closed Blocker Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 15/Oct/18 06:42   02/Apr/19 06:40 11/Jan/19 04:18 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build, scripts   0 2 0 3600   After the directory structures has been created, it is time to create the pom files for all the modules, and create the build hierarchy.
At first, ant should remain in place until we are sure maven works fine.

After maven build is stable for jute, server, client and common recipes and contrib should be finished as well.

The different modules will get their maven structure:
{noformat}
zookeeper-[something]
| -src
| | -main
| | | -java
| | | \org...
| | \resources
| | -test (unit tests only)
| | | -java
| | | \org...
| | \ resources
| | - it (integration tests)
| \pom.xml
{noformat}
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 9 weeks, 6 days ago 0|i3z6w7:
ZooKeeper ZOOKEEPER-3170

Umbrella for eliminating ZooKeeper flaky tests

Test In Progress Major Unresolved Andor Molnar Andor Molnar Andor Molnar 15/Oct/18 06:17   23/Jan/20 13:17       tests   0 5   ZOOKEEPER-1802, ZOOKEEPER-2481, ZOOKEEPER-2485, ZOOKEEPER-2486, ZOOKEEPER-2493, ZOOKEEPER-2497, ZOOKEEPER-2499, ZOOKEEPER-2529, ZOOKEEPER-2538, ZOOKEEPER-2610, ZOOKEEPER-2679, ZOOKEEPER-2752, ZOOKEEPER-2753, ZOOKEEPER-2754, ZOOKEEPER-2781, ZOOKEEPER-2807, ZOOKEEPER-2877, ZOOKEEPER-2916, ZOOKEEPER-2966, ZOOKEEPER-2970, ZOOKEEPER-3023, ZOOKEEPER-3046, ZOOKEEPER-3047, ZOOKEEPER-3048, ZOOKEEPER-3089, ZOOKEEPER-3141, ZOOKEEPER-3193, ZOOKEEPER-3200, ZOOKEEPER-3201, ZOOKEEPER-3202, ZOOKEEPER-3222, ZOOKEEPER-3233, ZOOKEEPER-3429, ZOOKEEPER-3470, ZOOKEEPER-3477 Umbrella ticket for joint community efforts to reduce number of flaky tests and improve the stability of our Jenkins builds. 100% 59400 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 21 weeks, 6 days ago 0|i3z6v3:
ZooKeeper ZOOKEEPER-3169

Reduce session revalidation time after zxid roll over

Improvement Open Major Unresolved Unassigned 田毅群 田毅群 13/Oct/18 10:32   04/Nov/18 06:51   3.4.5, 3.5.0 3.4.5     0 2   1. Sometimes Zookeeper cluster will receive a lot of connections from clients, sometimes connection number even exceeds 1W. When zxid rolls over, the clients will reconnect and revalidate the session.

2. In Zookeeper design structure, when follower server receives the session revalidation requests, it will send requests to leader server, which is designed to be responsible for session revalidation.

3. In a short time, Leader will handle lots of requests. I use a tool to get the statistics, some clients need to wait over 20s. It is too long for some special clients, like ResourceManager.

4. I design a thought: when zxid rollover happens. Leader will record the accurate time. When reelection finishs, all servers will get the rollover time. When clients reconnect and revalidate session. All servers can judge it. So it can reduce a lots of pressure of cluster, all clients can will wait for less time.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Important
1 year, 19 weeks, 6 days ago 0|i3z5yv:
ZooKeeper ZOOKEEPER-3168

Reduce session revalidation time after zxid roll over

Improvement Resolved Major Invalid Unassigned 田毅群 田毅群 13/Oct/18 10:30   01/Nov/18 01:30 01/Nov/18 01:30 3.4.5, 3.5.0 3.4.5     0 1   1. Sometimes Zookeeper cluster will receive a lot of connections from clients, sometimes connection number even exceeds 1W. When zxid rolls over, the clients will reconnect and revalidate the session.

2. In Zookeeper design structure, when follower server receives the session revalidation requests, it will send requests to leader server, which is designed to be responsible for session revalidation.

3. In a short time, Leader will handle lots of requests. I use a tool to get the statistics, some clients need to wait over 20s. It is too long for some special clients, like ResourceManager.

4. I design a thought: when zxid rollover happens. Leader will record the accurate time. When reelection finishs, all servers will get the rollover time. When clients reconnect and revalidate session. All servers can judge it. So it can reduce a lots of pressure of cluster, all clients can will wait for less time.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Important
1 year, 22 weeks, 5 days ago 0|i3z5yn:
ZooKeeper ZOOKEEPER-3167

add an API and the corresponding CLI to get total count of recursive sub nodes under a specific path

New Feature Resolved Minor Fixed maoling 田毅群 田毅群 13/Oct/18 10:09   07/Feb/19 11:00 07/Feb/19 05:26 3.4.5, 3.5.0 3.6.0     0 4 0 9600   1. In production environment, there will be always a situation that there are a lot of recursive sub nodes of one node. We need to count total number of it.

2. Now, we can only use API getChildren which returns the List<String> of first level of sub nodes. We need to iterate every sub node to get recursive sub nodes. It will cost a lot of time.

3. In zookeeper server side, it uses Hasp<String, DataNode> to store node. The key of the map represents the path of the node. We can iterate the map get total number of all levels of sub nodes of one node.
100% 100% 9600 0 patch, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
1 year, 6 weeks ago 0|i3z5xr:
ZooKeeper ZOOKEEPER-3166

Support changing secure port with reconfig

Improvement Open Minor Unresolved Unassigned Brian Nixon Brian Nixon 12/Oct/18 15:43   10/Jul/19 06:58   3.6.0   quorum   0 2   The reconfig operation supports changing the plaintext client port and client address but, because the secure client port is not encoded in the QuorumVerifier serialization, the secure client port cannot be changed by similar means. Instead, this information can only be changed in the static configuration files and only viewed there.

Flagging as a place where there's not feature parity between secure client ports and plaintext client ports.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks, 6 days ago 0|i3z5an:
ZooKeeper ZOOKEEPER-3165

Java 9: X509UtilTest.testCreateSSLContextWithoutTrustStorePassword fails

Bug Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 12/Oct/18 07:58   20/May/19 13:50 16/Oct/18 05:57 3.5.5 3.5.5 tests   0 2 0 4200   *Error Message*

Failed to create TrustManager

*Stacktrace*

org.apache.zookeeper.common.X509Exception$SSLContextException: Failed to create TrustManager
at org.apache.zookeeper.common.X509Util.createSSLContext(X509Util.java:210)
at org.apache.zookeeper.common.X509Util.createSSLContext(X509Util.java:163)
at org.apache.zookeeper.common.X509Util.getDefaultSSLContext(X509Util.java:147)
at org.apache.zookeeper.common.X509UtilTest.testCreateSSLContextWithoutTrustStorePassword(X509UtilTest.java:184)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: org.apache.zookeeper.common.X509Exception$TrustManagerException: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:299)
at org.apache.zookeeper.common.X509Util.createSSLContext(X509Util.java:207)
Caused by: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at java.base/java.security.cert.PKIXParameters.setTrustAnchors(PKIXParameters.java:200)
at java.base/java.security.cert.PKIXParameters.<init>(PKIXParameters.java:157)
at java.base/java.security.cert.PKIXBuilderParameters.<init>(PKIXBuilderParameters.java:130)
at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:274)
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks, 2 days ago 0|i3z4n3:
ZooKeeper ZOOKEEPER-3164

Backport ZOOKEEPER-3057 to branch3.5

Improvement Open Minor Unresolved Unassigned maoling maoling 10/Oct/18 06:24   20/Nov/18 09:36   3.4.6   server   0 1   https://issues.apache.org/jira/browse/ZOOKEEPER-3057 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 23 weeks, 1 day ago 0|i3z147:
ZooKeeper ZOOKEEPER-3163

Use session map to improve the performance when closing session in Netty

Improvement Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 09/Oct/18 11:54   24/Oct/18 01:41 23/Oct/18 22:03   3.6.0 server   0 2 0 9000   Previously, it needs to go through all the cnxns to find out the session to close, which is O(N), N is the total connections we have.

This will affect the performance of close session or renew session if there are lots of connections on this server, this JIRA is going to reuse the session map code in NIO implementation to improve the performance.
100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 21 weeks, 1 day ago 0|i3yzwn:
ZooKeeper ZOOKEEPER-3162

Broken lock semantics in C client lock-recipe

Bug Closed Major Fixed Andrea Reale Andrea Reale Andrea Reale 09/Oct/18 11:18   02/Apr/19 06:40 12/Nov/18 17:21 3.0.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 c client   0 2 0 10800   As reported (but never fixed) in the past by ZOOKEEPER-2409, ZOOKEEPER-2038 and (partly) ZOOKEEPER-2878, the C client lock-recipe implementation is broken.

I identified three issues.

The main one (as also reported in the aforementioned reports) is that the logic that goes through the lock waiting list is broken. child_floor uses strcmp and compares the full node name (i.e., sessionID-sequence) rather than only comparing the sequence number. This makes it possible for two different clients to hold the lock at the same time: assume two clients, one associated with session A, the other with session B, with A < B lexicographically. Now assume that at some point a thread in B holds a lock and a thread in A tries to acquire the same lock. A will manage to get the lock because of the wrong comparison function, so now two guys hold the lock.

The second issue is a possible deadlock inside zkr_lock_operation. zkr_lock_operation is always called by holding the mutex associated to the client lock. In some cases, zkr_lock_operaton may decide to give-up locking and call zkr_lock_unlock to release the lock. When this happens, it will try to acquire again the same phtread mutex, which will lead to a deadlock.

The third issue relates to the return value of zkr_lock_lock. According to the API docs, the functions returns 0 when no errors. Then it is up to the invoker to check when the lock is held by calling zkr_lock_isowner. However, the implementation, in case of no error, returns zkr_lock_isowner. This is wrong because it becomes impossible to distinguish an error condition from a success (but not ownerhsip). Instead the API (as described in the docs, btw) should return always 0 when no errors occur.

Shortly I will add the link to a PR fixing the issues.

 
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
1 year, 18 weeks, 3 days ago 0|i3yztr:
ZooKeeper ZOOKEEPER-3161

Refactor QuorumPeerMainTest.java: move commonly used functions to base class

Improvement Resolved Major Fixed Andor Molnar Andor Molnar Andor Molnar 08/Oct/18 09:08   12/Oct/18 08:05 12/Oct/18 04:27 3.5.4, 3.6.0 3.6.0 tests   0 2 0 5400   Move the following methods to QuorumPeerTestBase.java:
- tearDown()
- LaunchServers()
- waitForOne(), waitForAll()
- logStates()
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks, 6 days ago 0|i3yxyn:
ZooKeeper ZOOKEEPER-3160

Custom User SSLContext

New Feature Resolved Minor Fixed Alex Rankin Alex Rankin Alex Rankin 02/Oct/18 11:46   25/Jan/19 14:23 25/Jan/19 08:32 3.5.4 3.6.0 java client   0 2 0 48600   The Zookeeper libraries currently allow you to set up your SSL Context via system properties such as "zookeeper.ssl.keyStore.location" in the X509Util. This covers most simple use cases, where users have software keystores on their harddrive.

There are, however, a few additional scenarios that this doesn't cover. Two possible ones would be:
# The user has a hardware keystore, loaded in using PKCS11 or something similar.
# The user has no access to the software keystore, but can retrieve an already-constructed SSLContext from their container.

For this, I would propose that the X509Util be extended to allow a user to set a property such as "zookeeper.ssl.client.context" to provide a class which supplies a custom SSL context. This gives a lot more flexibility to the ZK client, and allows the user to construct the SSLContext in whatever way they please (which also future proofs the implementation somewhat).

I've already completed this feature, and will put in a PR soon for it.
100% 100% 48600 0 features, pull-request-available, ready-to-commit 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 6 days ago 0|i3yqnb:
ZooKeeper ZOOKEEPER-3159

Flaky: ClientRequestTimeoutTest.testClientRequestTimeout

Improvement Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 02/Oct/18 09:52   20/May/19 13:50 12/Oct/18 04:29 3.5.4, 3.6.0 3.6.0, 3.5.5 tests   0 3 0 6000   https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk/212/ 100% 100% 6000 0 flaky, flaky-test, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks, 6 days ago 0|i3yqfz:
ZooKeeper ZOOKEEPER-3158

firstConnect.countDown() will not be executed where sendThread.primeConnection() has thrown an exception

Bug Open Trivial Unresolved Unassigned maoling maoling 28/Sep/18 23:47   19/Feb/19 05:49       server   0 1 0 6000   look at the source code in the ClientCnxnSocketNetty.connect(InetSocketAddress):

{code:java}
public void operationComplete(ChannelFuture channelFuture) throws Exception {
// this lock guarantees that channel won't be assgined after cleanup().
connectLock.lock();
try {
//----------------------
sendThread.primeConnection();
//-----------------------
firstConnect.countDown();
LOG.info("channel is connected: {}", channelFuture.getChannel());
} finally {
connectLock.unlock();
}
}
});
{code}

firstConnect.countDown() will not be executed where sendThread.primeConnection() has thrown an exception,it should be put into finally code block.
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 24 weeks, 5 days ago 0|i3ymvz:
ZooKeeper ZOOKEEPER-3157

Improve FuzzySnapshotRelatedTest to avoid flaky due to issues like connection loss

Test Resolved Minor Fixed Andor Molnar Fangmin Lv Fangmin Lv 28/Sep/18 01:53   08/Oct/18 11:23 08/Oct/18 08:45 3.6.0 3.6.0 tests   0 4 0 3600   [~hanm] noticed that the test might failure because of ConnectionLoss when trying to getData, [here is an example|https://builds.apache.org/job/ZooKeepertrunk/198/testReport/junit/org.apache.zookeeper.server.quorum/FuzzySnapshotRelatedTest/testPZxidUpdatedWhenLoadingSnapshot], we should catch this and retry to avoid flaky.

Internally, we 'fixed' flaky test by adding junit.RetryRule in ZKTestCase, which is the base class for most of the tests. I'm not sure this is the right way to go or not, since it's actually 'hiding' the flaky tests, but this will help reducing the flaky tests a lot if we're not going to tackle it in the near time, and we can check the testing history to find out which tests are flaky and deal with them separately. So let me know if this seems to provide any benefit in short term, if it is I'll provide a patch to do that.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 23 weeks, 3 days ago 0|i3ylmf:
ZooKeeper ZOOKEEPER-3156

ZOOKEEPER-2184 causes kerberos principal to not have resolved host name

Bug Closed Blocker Fixed Robert Joseph Evans Robert Joseph Evans Robert Joseph Evans 26/Sep/18 13:20   04/Oct/19 10:55 05/Nov/18 13:42 3.6.0, 3.4.13, 3.5.5 3.6.0, 3.5.5, 3.4.14 java client   0 6 0 44400   Prior to ZOOKEEPER-2184 the zookeeper client would canonicalize a configured host name before creating the SASL client which is used to create the principal name.  After ZOOKEEPER-2184 that canonicalization does not happen so the principal that the ZK client tries to use when it is configured to talk to a CName is different between 3.4.13 and all previous versions of ZK.

 

For example

 

zk1.mycluster.mycompany.com maps to real-node.mycompany.com.

 

3.4.13 will want the server to have [zookeeper/zk1.mycluster.com@KDC.MYCOMPANY.COM|mailto:zookeeper/zk1.mycluster.com@KDC.MYCOMPANY.COM]

3.4.12 wants the server to have [zookeeper/real-node.mycompany.com@KDC.MYCOMPANY.COM|mailto:zookeeper/real-node.mycompany.com@KDC.MYCOMPANY.COM]

 

This makes 3.4.13 incompatible with many ZK setups currently in existence.  It would be nice to have that resolution be optional because in some cases it might be nice to have a single principal tied to the cname.
100% 100% 44400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 19 weeks, 3 days ago 0|i3yj5r:
ZooKeeper ZOOKEEPER-3155

ZOOKEEPER-925 Remove Forrest XMLs and their build process from the project

Sub-task Closed Blocker Fixed Tamas Penzes Tamas Penzes Tamas Penzes 26/Sep/18 02:44   02/Apr/19 06:40 09/Nov/18 19:10 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14     0 2 0 16800   Remove obsoleted Forrest XML files and their build process from the project. 100% 100% 16800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 18 weeks, 6 days ago 0|i3yi67:
ZooKeeper ZOOKEEPER-3154

ZOOKEEPER-925 Update release process to use the MarkDown solution

Sub-task Closed Major Fixed Tamas Penzes Tamas Penzes Tamas Penzes 26/Sep/18 02:43   02/Apr/19 06:40 15/Oct/18 10:49   3.6.0, 3.5.5, 3.4.14     0 2 0 19200   We have to update the release process to use the MarkDown solution.

We have to add the zookeeper-docs/src/main/resources/markdown and the zookeeper-docs/target/html directories and their content to the tarball.

We also have to remove the old mechanism of adding PDFs and old generated HTML files to the tarball in the same time.
100% 100% 19200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks, 3 days ago 0|i3yi5z:
ZooKeeper ZOOKEEPER-3153

ZOOKEEPER-925 Create MarkDown files and build process for them

Sub-task Closed Major Fixed Tamas Penzes Tamas Penzes Tamas Penzes 26/Sep/18 02:16   28/Nov/19 22:56 05/Oct/18 09:25 3.5.4, 3.4.13 3.6.0, 3.5.5, 3.4.14 documentation   0 2 0 19800   In this sub-task we have to transform the Forrest XML documents into MarkDown (.md) files and provide a (maven based) solution to create HTML documentation from them.

PDF support is dropped since it is not really used and makes everything overcomplicated.

The generated HTML content should look similar to the one generated from Forrest XMLs, but not needed to be identical with them.
100% 100% 19800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 23 weeks, 6 days ago Migrate documentation to MarkDown. 0|i3yi4n:
ZooKeeper ZOOKEEPER-3152

Port ZK netty stack to netty 4

Improvement Closed Minor Fixed Ilya Maykov Ilya Maykov Ilya Maykov 20/Sep/18 22:12   20/May/19 13:50 22/Nov/18 11:56 3.6.0 3.6.0, 3.5.5 java client, server   2 5 0 51000   Netty 3 is super old. Let's port ZK's netty stack to netty 4. I'm working on a patch that I will put up as a pull request on github once we finish testing it internally at Facebook, just getting the Jira ticket ready ahead of time. 100% 100% 51000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks ago 0|i3yc2n:
ZooKeeper ZOOKEEPER-3151

Jenkins github integration is broken if retriggering the precommit job through Jenkins admin web page.

Bug Resolved Minor Workaround Michael Han Michael Han Michael Han 19/Sep/18 14:37   06/Jan/20 15:08 19/Sep/18 19:04     build-infrastructure   0 3 0 11400   When trigger a precommit check Jenkins job directly through the [web interface|https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/] , the result can't be relayed back on github, after the job finished. 100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 26 weeks, 1 day ago 0|i3y9nj:
ZooKeeper ZOOKEEPER-3150

ZOOKEEPER-3114 Data integrity check when loading snapshot/txns from disk

Sub-task Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 18/Sep/18 23:15   31/Jul/19 15:12 31/Jul/19 13:06   3.6.0 server   1 4 0 38400   This is a sub task of ZOOKEEPER-3114, which is going to check the data integrity by calculating the hash value of data tree, and compare the value when reload the snapshot/txns from disk. 100% 100% 38400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 1 day ago 0|i3y8m7:
ZooKeeper ZOOKEEPER-3149

Unreachable node can prevent remaining nodes from gaining quorum

Bug Resolved Minor Duplicate Unassigned Andrew January Andrew January 14/Sep/18 14:14   16/Sep/18 15:13 16/Sep/18 15:13 3.4.12   leaderElection   0 3   Steps to reproduce:
# Have a 3 node cluster set up, with node 2 as the leader, and node 3 zxid ahead of node 1 such that node 3 will be the new leader when node 2 disappears.
# Shut down node 2 such that it is unreachable and attempts to connect to it yield a socket timeout.
# Have the remaining two nodes get "Connection refused" responses almost immediately if one tries to connect to the other on a port that isn't open.

Expected behaviour:

The remaining nodes reach quorum.

Actual behaviour:

The remaining nodes repeatedly fail to reach quorum, spinning and holding elections until node 2 is brought back.

 

This is because:
# An election for a new leader starts.
# Both nodes broadcast notifications to all the other nodes
# The notifications are sent to node 1 quickly, then it tries to send it to node 2, which takes cnxTimeout (default 5s) before timing out, then sends it to node 3. This results in all the notifications to node 3 taking 5 seconds to arrive.
# Despite the delays, node 1 and node 3 agree that node 3 should be leader.
# node 1 sends the message that it will follow node 3, then immediately tries to connect to it as leader.
# Because of the delay, node 3 hasn't yet received the notification that node 1 is following it, so doesn't start accepting requests.
# This causes the requests from node 1 to fail quickly with "Connection refused".
# It retries 5 times (pausing a second between each)
# Because these connection refused are happening at 1/5th of cnxTimeout, node 1 gives up trying to follow node 3 and starts a new election.
# Node 3 times out waiting for node 1 to acknowledge it as leader, and starts a new election.

 

We can work around the issue by decreasing cnxTimeout to be less than 5. However, it seems like a bad idea to rely on tweaking a value based on network performance, especially as the value is only configurable via JVM args rather than the conf files.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 6 days ago 0|i3y41j:
ZooKeeper ZOOKEEPER-3148

Fix Kerberos tests on branch 3.4 and JDK11

Bug Closed Critical Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 14/Sep/18 11:01   02/Apr/19 06:40 25/Sep/18 07:25 3.4.13 3.4.14 kerberos   0 3 0 4200   Branch 3.4 uses Apache Directory Service for Kerberos tests, this is not compatibile with JDK 11.

A simple "upgrade" is not enough.

The fix is to port Kerby based tests from branch-3.5 to branch-3.4 and make old tests run *only on JDK6* and new tests with Kerby run on JDK7 onwards.

 

There will be some duplicated code, but branch-3.4 is expected to be sent in be deprecated soon, as 3.5 will be released as "stabile".

Those "old" test would be dropped in case we decide to drop JDK6 support.

 

Additionally JDK6 VMs cannot download dependencies from Maven Central due to SSL policies:

[ivy:retrieve]     Server access error at url https://repo1.maven.org/maven2/net/minidev/json-smart/ (javax.net.ssl.SSLException: Received fatal alert: protocol_version)
[ivy:retrieve]     Server access error at url https://repo1.maven.org/maven2/net/minidev/json-smart/ (javax.net.ssl.SSLException: Received fatal alert: protocol_version)

 

 

 
100% 100% 4200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 25 weeks, 2 days ago 0|i3y3pz:
ZooKeeper ZOOKEEPER-3147

Enable server tracking client information

Improvement Open Major Unresolved Michael Han Michael Han Michael Han 12/Sep/18 17:51   12/Sep/18 17:51   3.6.0   java client, server   1 2   We should consider add fine grained tracking information for clients and send these information to server side, which will be useful for debugging and in future multi-tenancy support / enforced quota. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 27 weeks, 1 day ago 0|i3y12f:
ZooKeeper ZOOKEEPER-3146

Limit the maximum client connections per IP in NettyServerCnxnFactory

Improvement Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 12/Sep/18 14:59   22/Sep/18 02:45 22/Sep/18 00:48   3.6.0 server   0 3 0 9000   There is maximum connections per IP limit in NIOServerCnxnFactory implementation, but not exist in Netty, this is useful to avoid spamming happened on prod ensembles. 

This Jira is going to add similar throttling logic in NettyServerCnxnFactory.
100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 25 weeks, 5 days ago 0|i3y0uf:
ZooKeeper ZOOKEEPER-3145

Potential watch missing issue due to stale pzxid when replaying CloseSession txn with fuzzy snapshot

Bug Resolved Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 11/Sep/18 19:47   11/Sep/19 09:30 11/Sep/19 04:27 3.5.4, 3.6.0, 3.4.13 3.6.0 server   1 3 0 26400   This is another issue I found recently, we haven't seen this problem on prod (or maybe we don't notice).

 
Currently, the CloseSession is not idempotent, executing the CloseSession twice won't get the same result.
 
The problem is that closeSession will only check what's the ephemeral nodes associated with that session bases on current states. Nodes deleted during taking fuzzy snapshot won't be deleted again when replay the txn.
 
This looks fine, since it's already gone, but there is problem with the pzxid of the parent node. Snapshot is taken fuzzily, so it's possible that the parent had been serialized while the nodes are being deleted when executing the closeSession Txn. The pzxid will not be updated in the snapshot when replaying the closeSession txn, because doesn't know what's the paths being deleted, so it won't patch the pzxid like what we did in the deleteNode ZOOKEEPER-3125.
 
The inconsistent pzxid will lead to potential watch notification missing when client reconnect with setWatches because of the staleness. 
 
This JIRA is going to fix those issues by adding the CloseSessionTxn, it will record all those nodes being deleted in that CloseSession txn, so that we know which nodes to update when replaying the txn.
100% 100% 26400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 1 day ago 0|i3xzm7:
ZooKeeper ZOOKEEPER-3144

Potential ephemeral nodes inconsistent due to global session inconsistent with fuzzy snapshot

Bug Resolved Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 11/Sep/18 19:14   14/Sep/18 18:55 14/Sep/18 18:08 3.5.4, 3.6.0, 3.4.13 3.6.0 server   0 4 0 1800   Found this issue recently when checking another prod issue, the problem is that the current code will update lastProcessedZxid before it's actually making change for the global sessions in the DataTree.
 
In case there is a snapshot taking in progress, and there is a small time stall between set lastProcessedZxid and update the session in DataTree due to reasons like thread context switch or GC, etc, then it's possible the lastProcessedZxid is actually set to the future which doesn't include the global session change (add or remove).
 
When reload this snapshot and it's txns, it will replay txns from lastProcessedZxid + 1, so it won't create the global session anymore, which could cause data inconsistent.
 
When global sessions are inconsistent, it might have ephemeral inconsistent as well, since the leader will delete all the ephemerals locally if there is no global sessions associated with it, and if someone have snapshot sync with it then that server will not have that ephemeral as well, but others will. It will also have global session renew issue for that problematic session.
 
The same issue exist for the closeSession txn, we need to move these global session update logic before processTxn, so the lastProcessedZxid will not miss the global session here.
 
 
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 6 days ago 0|i3xzlb:
ZooKeeper ZOOKEEPER-3143

ZOOKEEPER-3092 Pluggable metrics system for ZooKeeper - Data Collection on Server

Sub-task Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 11/Sep/18 10:31   22/Apr/19 22:22 12/Apr/19 13:02 3.6.0 3.6.0 metric system   0 2 0 40200   This task is to integrate the new MetricsProvider system with ZooKeeper Server code.

Ideally we should port every metrics available on 4lw to the new system.
100% 100% 40200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 5 days ago 0|i3xyu7:
ZooKeeper ZOOKEEPER-3142

Extend SnapshotFormatter to dump data in json format

Improvement Resolved Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 10/Sep/18 18:52   24/Nov/18 14:55 14/Sep/18 19:10 3.6.0 3.6.0     0 2 0 3600   Json format can be chained into other tools such as ncdu. Extend the SnapshotFormatter functionality to dump json.

 
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 5 days ago 0|i3xxqv:
ZooKeeper ZOOKEEPER-3141

ZOOKEEPER-3170 testLeaderElectionWithDisloyalVoter is flaky

Sub-task Reopened Major Unresolved Unassigned Michael Han Michael Han 10/Sep/18 18:28   04/Oct/19 10:55   3.6.0   leaderElection, server, tests   0 3   The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.

See [https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]

Recent failure builds:

[https://builds.apache.org/job/ZooKeeper-trunk//181] 

[https://builds.apache.org/job/ZooKeeper-trunk//179] 

[https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/] 

 

Snapshot of the failure:
{code:java}
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority

Error Message
Server 0 should have joined quorum by now
Stacktrace
junit.framework.AssertionFailedError: Server 0 should have joined quorum by now
at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks ago 0|i3xxpj:
ZooKeeper ZOOKEEPER-3140

Allow Followers to host Observers

New Feature Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 07/Sep/18 19:28   09/Dec/18 01:21 08/Dec/18 21:17 3.6.0 3.6.0 server   1 4 0 30000   Observers function simple as non-voting members of the ensemble, sharing the Learner interface with Followers and holding only a slightly difference internal pipeline. Both maintain connections along the quorum port with the Leader by which they learn of all new proposals on the ensemble.

There are benefits to allowing Observers to connect to the Followers to plug into the commit stream in addition to connecting to the Leader. It shifts the burden of supporting Observers off the Leader and allow it to focus on coordinating the commit of writes. This means better performance when the Leader is under high load, particularly high network load such as can happen after a leader election when many Learners need to sync. It also reduces the total network connections maintained on the Leader when there are a high number of observers. One the other end, Observer availability is improved since it will take shorter time for a high number of Observers to finish syncing and start serving client traffic.

The current implementation only supports scaling the number of Observers into the hundreds before performance begins to degrade. By opening up Followers to also host Observers, over a thousand observers can be hosted on a typical ensemble without major negative impact under both normal operation and during post-leader election sync.
100% 100% 30000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks, 4 days ago 0|i3xv4n:
ZooKeeper ZOOKEEPER-3139

Zookeeper is not getting started because server folder is not present in the jre/bin for version 1.8.0_181 on 32 bit machine

Bug Open Major Unresolved Unassigned Nitish Kulkarni Nitish Kulkarni 07/Sep/18 05:32   19/Sep/18 08:22   3.4.13   server   0 2 18000 18000 0% Windows 10 Enterprise N. I am trying to run zookeeper version 2.11-1.1.0 on 32 bit machine. I have installed jdk 1.8.0_181 but zookeeper is not running and displaying following error:

Error: missing {{server' JVM at }}C:\Program Files (x86)\Java\jre1.8.0_181\bin\server\jvm.dll'.
Please install or use the JRE or JDK that contains these missing components.

This is because for jdk1.8.0_181 is not creating server folder which contains jvm.dll.

So please let me know how is zookeeper going to address this issue. Because if this issue is not resolved zookeeper won't run on 32 bit machine.
0% 0% 18000 18000 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 1 day ago 0|i3xu5r:
ZooKeeper ZOOKEEPER-3138

Potential race condition with Quorum Peer mutual authentication via SASL

Bug Open Major Unresolved Unassigned Grzegorz Grzybek Grzegorz Grzybek 05/Sep/18 11:37   06/Sep/18 02:36   3.4.13   leaderElection, quorum, security   0 4   I'm in the process of reconfiguring the ensemble to use mutual quorum peer authentication using SASL (ZOOKEEPER-1045).

In order to understand the impact on my code, I've checked _how it works_. Now I'm running & debugging {{org.apache.zookeeper.server.quorum.auth.QuorumDigestAuthTest#testValidCredentials()}} test case.

I have now six threads (3 peers contacting each other):
* "QuorumConnectionThread-[myid=0]-2@1483" prio=5 tid=0x2b nid=NA runnable
* "QuorumConnectionThread-[myid=0]-3@1491" prio=5 tid=0x36 nid=NA runnable
* "QuorumConnectionThread-[myid=1]-1@1481" prio=5 tid=0x2d nid=NA runnable
* "QuorumConnectionThread-[myid=1]-4@1505" prio=5 tid=0x3c nid=NA runnable
* "QuorumConnectionThread-[myid=2]-2@1495" prio=5 tid=0x37 nid=NA runnable
* "QuorumConnectionThread-[myid=2]-4@1506" prio=5 tid=0x3d nid=NA runnable

at this point of invocation:
{noformat}
java.lang.Thread.State: RUNNABLE
at org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handleNameCallback(SaslQuorumServerCallbackHandler.java:101)
at org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handle(SaslQuorumServerCallbackHandler.java:82)
at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
at org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.authenticate(SaslQuorumAuthServer.java:100)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:467)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:386)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReceiverThread.run(QuorumCnxManager.java:422)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

which is this line:
{code:java}
private void handleNameCallback(NameCallback nc) {
// check to see if this user is in the user password database.
if (credentials.get(nc.getDefaultName()) == null) {
LOG.warn("User '{}' not found in list of DIGEST-MD5 authenticateable users.",
nc.getDefaultName());
return;
}
nc.setName(nc.getDefaultName());
/* >>> */ userName = nc.getDefaultName(); /* <<< */
}
{code}

each pair of threads is operating on single instance of {{org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler#userName}}. In the stack trace we have both shared and local variables/fields:
* {{o.a.z.server.quorum.QuorumCnxManager.QuorumConnectionReceiverThread#sock}} is thread-specific (ok)
* {{o.a.z.server.quorum.QuorumCnxManager#authServer}} is peer-specific (instance of {{o.a.z.server.quorum.auth.SaslQuorumAuthServer}}) but without a state that changes
* {{javax.security.sasl.SaslServer}} is thread-specific (ok) - this instance is created to handle sasl authentication, but is created using peer-specific JAAS subject (which is ok) and peer-specific {{o.a.z.server.quorum.auth.SaslQuorumAuthServer#serverLogin.callbackHadler}} {color:red}which is potentially a problem{color}

Each (out of six) thread handles different connection, but each pair (for given QuorumPeer) calls {{o.a.z.server.quorum.auth.SaslQuorumServerCallbackHandler#handleNameCallback()}} which modifies shared (peer-specific) field - {{userName}}.

I understand that [according to the example from Wiki|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication] all peers may use the same credentials (in simplest case).

But the "userName" comes from data sent by each peer, like this:
{noformat}
charset=utf-8,\
username="test",\
realm="zk-quorum-sasl-md5",\
nonce="iBqYWtaCrEE013S6Dv6xiOsR9uX2l/qKZcEZ1pm2",\
nc=00000001,\
cnonce="LVaL9XYFjNxVBPCjPewXjEBsj9GuwIfBN/RXsKt5",\
digest-uri="zookeeper-quorum/zk-quorum-sasl-md5",\
maxbuf=65536,\
response=dd4e9e2115ec2e304484d5191f3fc771,\
qop=auth,\
authzid="test"
{noformat}

*And I can imagine such JAAS configuration for DIGEST-MD5 SASL algorithm, that each peer uses own credentials and is able to validate other peers' specific credentials.*:
{noformat}
QuorumServer {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_peer1="peer1";
user_peer2="peer2";
user_peer3="peer3";
};
QuorumLearner1 {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="peer1"
password="peer1";
};
QuorumLearner2 {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="peer2"
password="peer2";
};
QuorumLearner2 {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="peer3"
password="peer3";
};
{noformat}

Isn't it a race condition? Like this (having 3 peers):
||thread handling peer 2 → peer 1 connection||thread handling peer 3 → peer 1 connection||
|sets o.a.z.s.q.auth.SaslQuorumServerCallbackHandler#userName to "peer2"| |
| |sets o.a.z.s.q.auth.SaslQuorumServerCallbackHandler#userName to "peer3"|
|sets PasswordCallback.password to o.a.z.s.q.auth.SaslQuorumServerCallbackHandler#credentials.get("peer3")| |
| | continues ...|
|com.sun.security.sasl.digest.DigestMD5Base#generateResponseValue() generates expected response using:
* username: "peer2"
* password of user "peer3"| |

Please verify.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 28 weeks, 1 day ago 0|i3xrev:
ZooKeeper ZOOKEEPER-3137

add a utility to truncate logs to a zxid

New Feature Resolved Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 31/Aug/18 18:43   24/Nov/18 14:55 14/Sep/18 18:04 3.6.0 3.6.0     0 2 0 7800   Add a utility that allows an admin to truncate a given transaction log to a specified zxid. This can be similar to the existent LogFormatter.

Among the benefits, this allows an admin to put together a point-in-time view of a data tree by manually mutating files from a saved backup.
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 6 days ago 0|i3xn87:
ZooKeeper ZOOKEEPER-3136

Reduce log in ClientBase in case of ConnectException

Task Resolved Minor Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 31/Aug/18 17:36   06/Sep/18 22:20 06/Sep/18 20:49   3.6.0 tests   0 2 0 3600   While running tests you will always see spammy log lines like the ones below.

As we are expecting the server to be up, it is not useful to log such stacktraces.

The patch simply reduce the log in this specific case, because it adds no value and it is very annoying.

 
{code:java}
     [junit] 2018-08-31 23:31:49,173 [myid:] - INFO  [main:ClientBase@292] - server 127.0.0.1:11222 not up
    [junit] java.net.ConnectException: Connection refused (Connection refused)
    [junit]     at java.net.PlainSocketImpl.socketConnect(Native Method)
    [junit]     at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    [junit]     at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    [junit]     at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    [junit]     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    [junit]     at java.net.Socket.connect(Socket.java:589)
    [junit]     at org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:101)
    [junit]     at org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:71)
    [junit]     at org.apache.zookeeper.test.ClientBase.waitForServerUp(ClientBase.java:285)
    [junit]     at org.apache.zookeeper.test.ClientBase.waitForServerUp(ClientBase.java:276)
{code}
 
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 27 weeks, 6 days ago 0|i3xn5r:
ZooKeeper ZOOKEEPER-3135

update lastSend, lastHeard with current timestamp when client reconnects successfully

Bug Open Minor Unresolved Unassigned yangkun yangkun 29/Aug/18 22:58   21/Feb/20 01:49           1 3 0 3600   ClientCnxnSocket#updateLastSendAndHeard() method update lastSend、lastHeard to now:

 
{code:java}
void updateLastSendAndHeard() {
this.lastSend = now;
this.lastHeard = now;
}

void updateNow() {
now = Time.currentElapsedTime();
}{code}
In SendThread#run() method, there are some place call updateLastSendAndHeard() method, simplified as follows:

 
{code:java}
public void run() {
clientCnxnSocket.updateNow();
// place-1. update lastSend、lastHeard
clientCnxnSocket.updateLastSendAndHeard();
while (state.isAlive()) {
try {
// ...some operation
startConnect(serverAddress);
// place-2. update lastSend、lastHeard
clientCnxnSocket.updateLastSendAndHeard();
}
}{code}
 

If so, place-1 and place-2, the lastSend、lastHeard value is equals, However, between place-1 and place-2 has some operation,consume some time,it should actually be unequal.

 

 

 

 
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
3 weeks, 6 days ago 0|i3xknz:
ZooKeeper ZOOKEEPER-3134

NIOServerCnxnFactory#run() method should remove synchronized (this)

Improvement Open Minor Unresolved Unassigned yangkun yangkun 29/Aug/18 09:34   29/Aug/18 09:34           0 1   Now NIOServerCnxnFactory#run() method is:

 
{code:java}
while (!ss.socket().isClosed()) {
try {
selector.select(1000);
Set<SelectionKey> selected;
// should remove synchronized?
synchronized (this) {
selected = selector.selectedKeys();
}
ArrayList<SelectionKey> selectedList = new ArrayList<SelectionKey>(selected);
...
}
}
{code}
It seems like no need to use synchronized (this)  statement, here is thread safe, should remove this statement?

Or is any of this statement  making sense?

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 29 weeks, 1 day ago 0|i3xjzb:
ZooKeeper ZOOKEEPER-3133

NIOServerCnxn.outstandingRequests is not updated correctly with sasl request

Bug Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 28/Aug/18 19:52   14/Dec/19 06:09   3.5.4, 3.6.0 3.7.0     0 1   The outstandingRequests is being decreased when we send the response for sasl request, but it's never increased before we handle it, so it might cause mis-counted outstandingRequests, and might enable receive packets from that socket before it should be. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 29 weeks, 2 days ago 0|i3xizj:
ZooKeeper ZOOKEEPER-3132

org.apache.zookeeper.server.WatchManager resource leak

Bug Resolved Major Duplicate Unassigned ChaoWang ChaoWang 28/Aug/18 04:33   28/Aug/18 04:40 28/Aug/18 04:40 3.5.3, 3.5.4   server   0 1   -Xmx512m  In some cases, the variable _watch2Paths_ in _Class WatchManager_ does not remove the entry, even if the associated value "HashSet" is empty already. 

The type of key in Map _watch2Paths_ is Watcher, instance of _NettyServerCnxn._ If it is not removed when the associated set of paths is empty, it will cause the memory increases little by little, and OutOfMemoryError triggered finally. 

 

In the following function, the logic should be added to remove the entry.

org.apache.zookeeper.server.WatchManager#removeWatcher(java.lang.String, org.apache.zookeeper.Watcher)

if (paths.isEmpty()) {
watch2Paths.remove(watcher);
}

For the following function as well:

org.apache.zookeeper.server.WatchManager#triggerWatch(java.lang.String, org.apache.zookeeper.Watcher.Event.EventType, java.util.Set<org.apache.zookeeper.Watcher>)

 

Please confirm this issue?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 29 weeks, 2 days ago 0|i3xhof:
ZooKeeper ZOOKEEPER-3131

org.apache.zookeeper.server.WatchManager resource leak

Bug Closed Major Fixed Fangmin Lv ChaoWang ChaoWang 28/Aug/18 04:32   20/May/19 13:50 06/Sep/18 20:35 3.5.3, 3.5.4, 3.6.0 3.6.0, 3.5.5 server   0 6 0 15600   -Xmx512m  In some cases, the variable _watch2Paths_ in _Class WatchManager_ does not remove the entry, even if the associated value "HashSet" is empty already. 

The type of key in Map _watch2Paths_ is Watcher, instance of _NettyServerCnxn._ If it is not removed when the associated set of paths is empty, it will cause the memory increases little by little, and OutOfMemoryError triggered finally. 

 

{color:#FF0000}*Possible Solution:*{color}

In the following function, the logic should be added to remove the entry.

org.apache.zookeeper.server.WatchManager#removeWatcher(java.lang.String, org.apache.zookeeper.Watcher)

if (paths.isEmpty())

{ watch2Paths.remove(watcher); }

For the following function as well:

org.apache.zookeeper.server.WatchManager#triggerWatch(java.lang.String, org.apache.zookeeper.Watcher.Event.EventType, java.util.Set<org.apache.zookeeper.Watcher>)

 

Please confirm this issue?
100% 100% 15600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 27 weeks, 6 days ago 0|i3xho7:
ZooKeeper ZOOKEEPER-3130

What is the amount of zookeeper source code?

Wish Resolved Major Invalid Unassigned Micheal_Bruce_Long Micheal_Bruce_Long 27/Aug/18 05:31   28/Aug/18 08:33 27/Aug/18 07:26 3.4.6       0 2   NONE  Hello:

      What is the amount of zookeeper source code? I use eclipse to calculate 3.4.6 version only 37035 rows, I feel so little that I can hardly believe it.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 29 weeks, 2 days ago 0|i3xfyn:
ZooKeeper ZOOKEEPER-3129

Improve ZK Client resiliency by throwing a jute.maxbuffer size exception before sending a request to server

Improvement Open Major Unresolved Unassigned Karan Mehta Karan Mehta 25/Aug/18 17:41   07/Sep/18 02:00           1 4   Zookeeper is mostly operated in controlled environments and the client/server properties are usually known. With this Jira, I would like to propose a new property on client side that represents the max jute buffer size server is going to accept.

On the ZKClient, in case of multi Op, the request is serialized and hence we know the size of complete packet that will be sent. We can use this new property to determine if the we are exceeding the limit and throw some form of KeeperException. This would be fail fast mechanism and the application can potentially retry by chunking up the request or serializing.

Since the same properties are now present in two locations, over time, two possibilities can happen.

-- Server jutebuffer accepts value is more than what is specified on client side

The application might end up serializing it or zkclient can be made configurable to retry even when it gets this exception

-- Server jutebuffer accepts value is lower than what is specified on client side

That would have failed previously as well, so there is no change in behavior

This would help silent failures like HBASE-18549 getting avoided. 

Thoughts [~apurtell] [~xucang] [~anmolnar] [~hanm]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 27 weeks, 6 days ago 0|i3xf9j:
ZooKeeper ZOOKEEPER-3128

CLI Commands display Authentication error for Authorization error

Bug Open Minor Unresolved Mohammad Arshad Mohammad Arshad Mohammad Arshad 24/Aug/18 03:02   05/Feb/20 07:16     3.7.0, 3.5.8 server   1 3   CLI Commands display "Authentication is not valid : /path123" when user does not have access on the znode /path123.

For example  command
{code:java}
get /path456 {code}
will display error message
{code:java}
Authentication is not valid : /path456 {code}
if user does not have read access on znode /path456.

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 29 weeks, 6 days ago 0|i3xdl3:
ZooKeeper ZOOKEEPER-3127

Fixing potential data inconsistency due to update last processed zxid with partial multi-op txn

Bug Closed Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 23/Aug/18 02:36   04/Oct/19 10:55 05/Sep/18 16:36 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5 server   0 5 0 6600   Found this issue while checking the code for another issue, this is a relatively rare case which we haven't seen it on prod so far.

Currently, the lastProcessedZxid is updated when applying the first txn of multi-op, if there is a snapshot in progress, it's possible that the zxid associated with the snapshot only include partial of the multi op.

When loading snapshot, it will only load the txns after the zxid associated with snapshot file, which could data inconsistency due to missing sub txns.

To avoid this, we only update the lastProcessedZxid when the whole multi-op txn is applied to DataTree.
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 1 day ago 0|i3xc3j:
ZooKeeper ZOOKEEPER-3126

Documentation: Overview page has missing content

Bug Open Minor Unresolved Unassigned Pritish Kapoor Pritish Kapoor 21/Aug/18 07:15   29/Aug/18 04:19   3.4.0   documentation   1 3   Documentation: Overview page [https://zookeeper.apache.org/doc/current/zookeeperOver.html] has missing content marked as _[tbd]_

Refer section "Nodes and ephemeral nodes" - Last line:

"Ephemeral nodes are useful when you want to implement _[tbd]_."

Similarly, many other lines are present with "_[tbd]_"
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 29 weeks, 1 day ago 0|i3x9bb:
ZooKeeper ZOOKEEPER-3125

Pzxid inconsistent issue when replaying a txn for a deleted node

Bug Closed Blocker Fixed Fangmin Lv Fangmin Lv Fangmin Lv 20/Aug/18 16:13   20/May/19 13:51 08/Nov/18 13:56   3.6.0, 3.5.5 server   0 4 0 34200   When taking snapshot or syncing snapshot from leader, it's having fuzzy snapshot, which means the parent node might already serialized before the child get deleted, during replay the txn it will skip update the parent pzxid in this case, which will cause inconsistency. 100% 100% 34200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 13 weeks ago 0|i3x8if:
ZooKeeper ZOOKEEPER-3124

Add the correct comment to show why we need the special logic to handle cversion and pzxid

Improvement Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 20/Aug/18 16:11   10/Sep/19 07:55 10/Sep/19 03:54   3.6.0 server   1 5 0 10800   The old comment about setCversionPzxid is not valid, the scenario it mentioned won't trigger the issue, update it to show the exact reason.
 
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks, 2 days ago 0|i3x8i7:
ZooKeeper ZOOKEEPER-3123

ZOOKEEPER-3092 MetricsProvider Lifecycle in ZooKeeper Server

Sub-task Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 17/Aug/18 09:08   11/Sep/18 11:23 11/Sep/18 09:23 3.6.0 3.6.0 metric system   0 2 0 26400   This subtask is for the licefycle code of the configured MetricsProvider inside ZooKeeper server, both standalone mode and quorum peer mode.

 
100% 100% 26400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 27 weeks, 2 days ago 0|i3x5lj:
ZooKeeper ZOOKEEPER-3122

ZOOKEEPER-3021 Verify build after maven migration and the end artifact

Sub-task Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 17/Aug/18 06:34   09/Apr/19 20:27 07/Feb/19 03:17 3.6.0 3.4.14 build, scripts   0 2 0 41400   Verify maven build works as expected, scripts (release, precommit, jenkins, reports etc) work with maven.
100% 100% 41400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
49 weeks, 2 days ago 0|i3x5fr:
ZooKeeper ZOOKEEPER-3121

DELETE - test

Bug Resolved Major Fixed Unassigned pacawat k pacawat k 15/Aug/18 04:19   21/Aug/18 12:53 15/Aug/18 04:20         0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 31 weeks, 1 day ago 0|i3x25b:
ZooKeeper ZOOKEEPER-3120

add NetBeans nbproject directory to .gitignore

Task Closed Minor Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 14/Aug/18 17:45   04/Oct/19 10:55 17/Aug/18 04:35   3.6.0, 3.5.5, 3.4.14     0 2 0 3000   100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 30 weeks, 6 days ago 0|i3x1q7:
ZooKeeper ZOOKEEPER-3119

DELETE - test

Bug Resolved Major Fixed Unassigned pacawat k pacawat k 14/Aug/18 00:51   21/Aug/18 12:53 15/Aug/18 04:12     build, jmx, kerberos   0 3   test 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 31 weeks, 2 days ago 0|i3x0l3:
ZooKeeper ZOOKEEPER-3118

DELETE - test

Bug Resolved Major Invalid Unassigned pacawat k pacawat k 13/Aug/18 06:34   27/Aug/19 09:57 27/Aug/19 09:57     build, jmx   0 1   test 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 31 weeks, 3 days ago 0|i3wzaf:
ZooKeeper ZOOKEEPER-3117

Correct the LeaderBean.followerInfo to only return the followers list

Bug Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 11/Aug/18 16:58   17/Aug/18 09:40 17/Aug/18 08:40   3.6.0 quorum   0 3 0 1800   The LeaderBean.followerInfo are returning all the learners, which includes the observers, it's not only followers, correct it to match the name. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 30 weeks, 6 days ago 0|i3wylj:
ZooKeeper ZOOKEEPER-3116

Make the DataTree.approximateDataSize more efficient

Improvement Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 09/Aug/18 00:39   04/Sep/18 06:51 04/Sep/18 05:55   3.6.0 server   0 2 0 4800   The approximateDataSize is a nice metric to show what's the total size stored in ZooKeeper ensemble over time, but it's expensive to query too often, since each query will go through all the nodes to calculate the total size.

It's better to use a counter to record the total data size when txns applied to the DataTree, which is cheaper.
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 28 weeks, 2 days ago 0|i3wuxz:
ZooKeeper ZOOKEEPER-3115

Delete snapshot file on error

Improvement Open Minor Unresolved kevin.chen Brian Nixon Brian Nixon 08/Aug/18 21:12   20/Aug/18 05:12   3.6.0   server   0 1   ZOOKEEPER-3082 guards against one particular failure mode that can cause a corrupt snapshot, when a empty file is created with a valid snapshot file name. All other instances of IOException when writing the snapshot are simply allowed to propagate up the stack.

One idea that came up during review ([https://github.com/apache/zookeeper/pull/560)] was whether we would ever want to leave a snapshot file on disk when an IOException is thrown. Clearly something has gone wrong at this point and rather than leave a potentially corrupt file, we can delete it and trust the transaction log when restoring the necessary transactions.

It would be great to modify FileTxnSnapLog::save to delete snapshot files more often on exceptions - provided that there's a way to identify when the file in that case is needed or corrupt.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks ago 0|i3wut3:
ZooKeeper ZOOKEEPER-3114

Built-in data consistency check inside ZooKeeper

New Feature Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 07/Aug/18 21:02   27/Dec/19 16:36 27/Dec/19 16:36   3.6.0 quorum   2 4   ZOOKEEPER-3150, ZOOKEEPER-3512 The correctness of ZooKeeper was kind of proved in theory in ZAB paper, but the implementation is a bit different from the paper, for example, save the currentEpoch and proposals/commits upon to NEWLEADER is not atomic in current implementation, so the correctness of ZooKeeper is not actually proved in reality.

Also bugs could be introduced during implementation, issues like sending NEWLEADER packet too early reported in ZOOKEEPER-3104 might be there since the beginning (didn't check exactly when this was introduced). 

More correctness issues were introduced when adding new features, like on disk txn sync, local session, retain database, etc, both of these features added inconsistency bugs on production.

To catch the consistency issue earlier, internally, we're running external consistency checkers to compare nodes (digest), but that's not efficient (slow and expensive) and there are corner cases we cannot cover in external checker. For example, we don't know the last zxid before epoch change, which makes it's impossible to check it's missing txn or not. Another challenge is the false negative which is hard to avoid due to fuzzy snapshot or expected txn gap during snapshot syncing, etc.

This Jira is going to propose a built-in real time consistency check by calculating the digest of DataTree after applying each txn, and sending it over to learner during propose time so that it can verify the correctness in real time. 

The consistency check will cover all phases, including loading time during startup, syncing, and broadcasting. It can help us avoid data lost or data corrupt due to bad disk and catch bugs in code.

The protocol change will make backward compatible to make sure we can enable/disable this feature transparently.

As for performance impact, based on our testing, it will add a bit overhead during runtime, but doesn't have obvious impact in general.
h2.  
100% 66000 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
11 weeks, 6 days ago 0|i3wt5j:
ZooKeeper ZOOKEEPER-3113

EphemeralType.get() fails to verify ephemeralOwner when currentElapsedTime() is small enough

Bug Closed Critical Fixed Andor Molnar Andor Molnar Andor Molnar 07/Aug/18 10:58   04/Oct/19 10:55 18/Oct/18 05:19 3.5.4, 3.6.0 3.6.0, 3.5.5 server   1 5 0 16200   EphemeralTypeTest.testServerIds() unit test fails on some systems that System.nanoTime() is smaller than a certain value.

The test generates ephemeralOwner in the old way (pre ZOOKEEPER-2901) without enabling the emulation flag and asserts for exception to be thrown when serverId == 255. This is right. ZooKeeper should fail on this case, because serverId cannot be larger than 254 if extended types are enabled. In this case ephemeralOwner with 0xff in the most significant byte indicates an extended type.

The logic which does the validation is in EphemeralType.get().

It checks 2 things:
* the extended type byte is set: 0xff,
* reserved bits (next 2 bytes) corresponds to a valid extended type.

Here is the problem: currently we only have 1 extended type: TTL with value of 0x0000 in the reserved bits.

Logic expects that if we have anything different from it in the reserved bits, the ephemeralOwner is invalid and exception should be thrown. That's what the test asserts for and it works on most systems, because the timestamp part of the sessionId usually have some bits in the reserved bits as well which eventually will be larger than 0, so the value is unsupported.

I think the problem is twofold:
* Either if we have more extended types, we'll increase the possibility that this logic will accept invalid sessionIds (as long as reserved bits indicate a valid extended type),
* Or (which happens on some systems) if the currentElapsedTime (timestamp part of sessionId) is small enough and doesn't occupy reserved bits, this logic will accept the invalid sessionId.

Unfortunately I cannot repro the problem yet: it constantly happens on a specific Jenkins slave, but even with the same distro and same JDK version I cannot reproduce the same nanoTime() values.
100% 100% 16200 0 pull-request-available, ttl_nodes 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks ago 0|i3wscf:
ZooKeeper ZOOKEEPER-3112

fd leak due to UnresolvedAddressException on connect.

Bug Open Critical Unresolved Unassigned Tianzhou Wang Tianzhou Wang 06/Aug/18 20:46   16/May/19 10:13   3.5.4, 3.4.13   java client   3 5 600 0 5400 900% if connecting domain fail to resolve and lead an UnresolvedAddressException, it would leak the fd. 100% 100% 5400 0 600 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Patch
1 year, 17 weeks ago 0|i3wrdz:
ZooKeeper ZOOKEEPER-3111

Add socket buffer size option to tune the TCP throughput between leader and learner

Improvement Resolved Minor Not A Problem Fangmin Lv Fangmin Lv Fangmin Lv 06/Aug/18 20:45   19/Dec/19 17:59 05/Sep/18 17:15     server   0 2 0 12000   Add the socket setting to let us able to tune the TCP receive and send window size to improve the throughput during network congestion or transferring large snapshot size during syncing. 100% 100% 12000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 28 weeks, 1 day ago 0|i3wrdj:
ZooKeeper ZOOKEEPER-3110

Improve the closeSession throughput in PrepRequestProcessor

Improvement Closed Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 06/Aug/18 20:17   20/May/19 13:50 08/Aug/18 00:30   3.6.0, 3.5.5 quorum   0 3 0 1800   On leader every expired global session will add 3 lines of logs, which is pretty heavy and if the log file is more than a few GB, the log for the closeSession in PrepRequestProcessor will slow down the whole ensemble's throughput. 

From some use case, we found the prep request processor will be a bottleneck when there are constantly high number of expired session or closing session explicitly.

This JIra is going to remove one of the useless log when prepare close session txns, which should give us higher throughput during processing large number of expire sessions.
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks, 1 day ago 0|i3wrc7:
ZooKeeper ZOOKEEPER-3109

Avoid long unavailable time due to voter changed mind when activating the leader during election

Improvement Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 01/Aug/18 18:05   06/Oct/18 08:24 29/Aug/18 00:27 3.6.0 3.6.0 quorum, server   0 6 0 12000   Occasionally, we'll find it takes long time to elect a leader, might longer then 1 minute, depends on how big the initLimit and tickTime are set.
 
This exposes an issue in leader election protocol. During leader election, before the voter goes to the LEADING/FOLLOWING state, it will wait for a finalizeWait time before changing its state. Depends on the order of notifications, some voter might change mind just after it voting for a server. If the server it was previous voting for has majority of votes after considering this one, then that server will goto LEADING state. In some corner cases, the leader may end up with timeout waiting for epoch ACK from majority, because of the changed mind voter. This usually happen when there are even number of servers in the ensemble (either because one of the server is down or being restarted and it takes long time to restart). If there are 5 servers in the ensemble, then we'll find two of them in LEADING/FOLLOWING state, another two in LOOKING state, but the LOOKING servers cannot join the quorum since they're waiting for majority servers FOLLOWING the current leader before changing to FOLLOWING as well.
 
As far as we know, this voter will change mind if it received a vote from another host which just started and start to vote itself, or there is a server takes long time to shutdown it's previous ZK server and start to vote itself when starting the leader election process.
 
Also the follower may abandon the leader if the leader is not ready for accepting learner connection when the follower tried to connect to it.
 
To solve this issue, there are multiple options: 

1. increase the finalizeWait time

2. smartly detect this state on leader and quit earlier

 
The 1st option is straightforward and easier to change, but it will cause longer leader election time in common cases.
 
The 2nd option is more complexity, but it can efficiently solve the problem without sacrificing the performance in common cases. It remembers the first majority servers voting for it, checking if there is anyone changed mind while it's waiting for epoch ACK. The leader will wait for sometime before quitting LEADING state, since one voter changed may not be a problem if there are still majority voters voting for it.
100% 100% 12000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 23 weeks, 5 days ago 0|i3wldb:
ZooKeeper ZOOKEEPER-3108

use a new property:myid in the zoo.cfg to substitute for myid file under the dataDir

Improvement Resolved Major Won't Fix maoling maoling maoling 31/Jul/18 08:31   04/Oct/19 10:55 25/Aug/19 21:59 3.5.0   server   0 9 0 10800   When use zk in distributional model,we need to touch a myid file in dataDir.then write a unique number to it.It is inconvenient and not user-friendly,Look at an example from other distribution system such as kafka:it just uses broker.id=0 in the server.properties to indentify a unique server node.This issue is going to abandon the myid file and use a new property such as server.id=0 in the zoo.cfg. this fix will be applied to master branch,branch-3.5+,
keep branch-3.4 unchaged.
100% 100% 10800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 5 days ago 0|i3wipb:
ZooKeeper ZOOKEEPER-3107

Coding standard and Checkstyle

Improvement Open Minor Unresolved Unassigned Praveendra Singh Praveendra Singh 30/Jul/18 17:55   31/Jul/18 08:46       build   0 3   I'm new to the Zookeeper as a contributor.

Was going through [https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute] a and noticed that link to [Sun's conventions|http://www.oracle.com/technetwork/java/codeconv-138413.html] doesn't work. Did some googling and noticed that it is archived at [https://www.oracle.com/technetwork/java/javase/overview/codeconvtoc-136057.html].

 

Do we still use this coding standard?

Apart from the code styling rules, we have additional ones listed on the Contributor Guide.

Instead of letting everyone remember all the rules, should we force it at build time?

 

There is a [Maven Checkstyle Plugin|https://maven.apache.org/plugins/maven-checkstyle-plugin/]  which can be leveraged.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 33 weeks, 2 days ago 0|i3whv3:
ZooKeeper ZOOKEEPER-3106

Zookeeper client supports IPv6 address and document the "IPV6 feature"

Improvement Resolved Major Resolved maoling maoling maoling 29/Jul/18 23:43   13/Oct/18 09:49 13/Oct/18 09:49     documentation, java client   0 4 0 13200   This issue is the follow-up work of [ZOOKEEPER-3057|https://issues.apache.org/jira/browse/ZOOKEEPER-3057]
1.ZK server side supports ipv6 style like this: server.1=[2001:db8:1::242:ac11:2]:2888:3888,but zk client side supports ipv6 like this:2001:db8:1::242:ac11:2:2181.we need unify them.
Look at the kafka example [KAFKA-1123|https://issues.apache.org/jira/browse/KAFKA-1123]. its producer client also supports ipv6 like this: [2001:db8:1::242:ac11:2]
2.document the "IPV6 feature" to let user know.
100% 100% 13200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 23 weeks, 3 days ago 0|i3wgjb:
ZooKeeper ZOOKEEPER-3105

Character coding problem occur when create a node using python3

Bug Closed Major Fixed Unassigned yang hao yang hao 26/Jul/18 08:53   16/Oct/19 14:59 29/Jul/18 21:22 3.5.0, 3.6.0, 3.4.14 3.6.0, 3.4.15, 3.5.6 contrib   0 7 3600 3600 0% linux when creating a node using python3,  InvalidACLException occurs all the time. it`s caused by imcompatible way of parsing acl passed through python3 api. 0% 0% 3600 3600 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks, 1 day ago 0|i3wckv:
ZooKeeper ZOOKEEPER-3104

Potential data inconsistency due to NEWLEADER packet being sent too early during SNAP sync

Bug Resolved Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 25/Jul/18 12:27   09/Dec/19 16:24 03/Aug/18 12:53 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5 server   0 13 0 7800   Currently, in SNAP sync, the leader will start queuing the proposal/commits and the NEWLEADER packet before sending over the snapshot over wire. So it's possible that the zxid associated with the snapshot might be higher than all the packets queued before NEWLEADER.
 
When the follower received the snapshot, it will apply all the txns queued before NEWLEADER, which may not cover all the txns up to the zxid in the snapshot. After that, it will write the snapshot out to disk with the zxid associated with the snapshot. In case the server crashed after writing this out, when loading the data from disk, it will use zxid of the snapshot file to sync with leader, and it could cause data inconsistent, because we only replayed partial of the historical data during previous syncing.
 
NEWLEADER packet means the learner now has the correct and almost up to data state as leader, so it makes more sense to move the NEWLEADER packet after sending over snapshot, and this is what we did in the fix.
 
Besides this, the socket timeout is changed to use smaller sync timeout after received NEWLEADER ack, in high write traffic ensembles with large snapshot, the follower might be timed out by leader before finishing sending over those queued txns after writing snapshot out, which could cause the follower staying in syncing state forever. Move the NEWLEADER packet after sending over snapshot can avoid this issue as well.
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
14 weeks, 3 days ago 0|i3w9zj:
ZooKeeper ZOOKEEPER-3103

ZOOKEEPER-3092 Pluggable metrics system for ZooKeeper - MetricsProvider API definition

Sub-task Resolved Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 24/Jul/18 06:48   14/Aug/18 10:13 14/Aug/18 08:34 3.6.0 3.6.0 metric system   0 4 0 22800   This sub task is for the design of the MetricsProvider API, that is the API to be implemented by a MetricsProvider in order to be plugged into a ZooKeeper server or ZooKeeper client 100% 100% 22800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 31 weeks, 2 days ago 0|i3w7uv:
ZooKeeper ZOOKEEPER-3102

Potential race condition when create ephemeral nodes

Bug Open Minor Unresolved Unassigned LuoFucong LuoFucong 24/Jul/18 02:08   08/Aug/18 22:47   3.6.0   server   0 4 0 7200   operating system: macOS High Sierra 10.13.6

java version: 8u152

 
The method 
{code:java}
public void createNode(final String path, byte data[], List<ACL> acl, long ephemeralOwner, int parentCVersion, long zxid, long time, Stat outputStat)
{code}
 

in class DataTree may conceal a potential race condition regarding the session ephemeral nodes map "Map<Long, HashSet<String>> ephemerals".

Specifically, the codes start from line 455:

 
{code:java}
} else if (ephemeralOwner != 0) {
HashSet<String> list = ephemerals.get(ephemeralOwner);
if (list == null) {
list = new HashSet<String>();
ephemerals.put(ephemeralOwner, list);
}
synchronized (list) {
list.add(path);
}
}{code}
 

When an ephemeral owner tries to create nodes concurrently (under different parent nodes), an empty "HashSet<String>" might be created multiple times, and replace each other.

The following unit test reveals the race condition:

 
{code:java}
@Test(timeout = 60000)
public void testSessionEphemeralNodesConcurrentlyCreated()
throws InterruptedException, NodeExistsException, NoNodeException {
long session = 0x1234;
int concurrent = 10;
Thread[] threads = new Thread[concurrent];
CountDownLatch latch = new CountDownLatch(1);
for (int i = 0; i < concurrent; i++) {
String parent = "/test" + i;
dt.createNode(parent, new byte[0], null, 0, -1, 1, 1);

Thread thread = new Thread(() -> {
try {
latch.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}

String path = parent + "/0";
try {
dt.createNode(path, new byte[0], null, session, -1, 1, 1);
} catch (Exception e) {
throw new IllegalStateException(e);
}
});
thread.start();
threads[i] = thread;
}
latch.countDown();
for (Thread thread : threads) {
thread.join();
}
int sessionEphemerals = dt.getEphemerals(session).size();
Assert.assertEquals(concurrent, sessionEphemerals);
}
{code}
The session "0x1234" has created 10 ephemeral nodes "/test\{0~9}/0" concurrently (in 10 threads), so its ephemeral nodes size retrieved from DataTree should be 10 while doesn't (assertion fail).

 

The fix should be easy:

 
{code:java}
private final ConcurrentMap<Long, HashSet<String>> ephemerals = new ConcurrentHashMap<>();

...

} else if (ephemeralOwner != 0) {
HashSet<String> list = ephemerals.get(ephemeralOwner);
if (list == null) {
list = new HashSet<String>();
HashSet<String> _list;
if ((_list = ephemerals.putIfAbsent(ephemeralOwner, list)) != null) {
list = _list;
}
}
synchronized (list) {
list.add(path);
}
}
{code}
 

 

 
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 34 weeks, 1 day ago 0|i3w7i7:
ZooKeeper ZOOKEEPER-3101

Add comment reminding users to add cases to zerror when adding values to ZOO_ERRORS

Improvement Open Trivial Unresolved Kent R. Spillner Kent R. Spillner Kent R. Spillner 23/Jul/18 15:51   08/Aug/18 13:12           0 1 0 5400   Add a comment at the bottom of ZOO_ERRORS reminding people to add cases to zerror when adding new error values (via https://github.com/apache/zookeeper/pull/575#issuecomment-406356144) 100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks, 1 day ago 0|i3w6vz:
ZooKeeper ZOOKEEPER-3100

ZooKeeper client times out due to random choice of resolved addresses

Bug Open Major Unresolved Andor Molnar Rajini Sivaram Rajini Sivaram 23/Jul/18 10:13   04/Oct/19 10:55   3.4.13   java client   0 8   The changes to ZooKeeper clients to re-resolve hosts made under ZOOKEEPER-2184 results in delays when only a subset of the addresses that a host resolves to are actually reachable. This can result in connection timeouts on the client.

For example, when running tests with a single ZooKeeper server accepting connections on 127.0.0.1 on a host that has both IPv4 and IPv6, we have seen connection timeouts in tests if client connects using `localhost` rather than `127.0.0.1`. ZooKeeper client resolves `localhost` to both the IPv4 and IPv6 addresses and chooses a random one. If IPv6 was chosen, a fixed one second backoff is applied before retry since there is only one hostname specified. After backoff, 'localhost' is resolved again and a random address chosen, which could also be the unconnectable IPv6 address.

For the list of host names specified for connection, the clients do round-robin without backoffs until connections to all hostnames are attempted. Can we also do the same for addresses that each of the hosts resolves to, so that backoffs are only applied after connection to each address is attempted once and every address is connected to once using round-robin rather than random selection? This will avoid delays in cases where at least one address can be connected to.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 33 weeks, 6 days ago 0|i3w6dj:
ZooKeeper ZOOKEEPER-3099

ZooKeeper cluster is unavailable for session_timeout time due to network partition in a three-node environment.  

Bug Open Major Unresolved Unassigned Jiafu Jiang Jiafu Jiang 23/Jul/18 01:30   14/Oct/18 21:52   3.4.11, 3.5.4, 3.4.12, 3.4.13   c client, java client   0 5    

The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, the default connectTimeout is session_time/hostProvider.size(). If the ZooKeeper cluster has 3 nodes, then connectTimeout is 1/3 * session_time.

 

Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is now the leader. Client c1 is now connected to zk2(follower). Then we shutdown the network of zk3(leader), the same time, client c1 begin to write some data to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with leader and begin a new election, and zk2 becomes the leader.

 

The write operation will not succeed due to the leader is unavailable. It will take at most readTimeout time for c1 to discover the failure, and client c1 will try to choose another ZooKeeper server. Unfortunately, c1 may choose zk3, which is unreachable now, then c1 will spend connectTimeout to find out that zk3 is unused. Notice that readTimeout + connectTimeout = sesstion_timeout in my case(three-node cluster).

 

Therefore, in this case, the ZooKeeper cluster is unavailable for session timeout time when only one ZooKeeper server is unreachable due to network partition.

 

I have some suggestions:
# The HostProvider used by ZooKeeper can be specified by an argument.
# readTimeout can also be specified in any way.

 

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 22 weeks, 3 days ago 0|i3w5qf:
ZooKeeper ZOOKEEPER-3098

Add additional server metrics

Improvement Resolved Major Fixed Joseph Blomstedt Joseph Blomstedt Joseph Blomstedt 20/Jul/18 15:55   17/Sep/18 20:15 17/Sep/18 19:00 3.6.0 3.6.0 server   0 5 0 21600   This patch adds several new server-side metrics as well as makes it easier to add new metrics in the future. This patch also includes a handful of other minor metrics-related changes.

Here's a high-level summary of the changes.
# This patch extends the request latency tracked in {{ServerStats}} to track {{read}} and {{update}} latency separately. Updates are any request that must be voted on and can change data, reads are all requests that can be handled locally and don't change data.
# This patch adds the {{ServerMetrics}} logic and the related {{AvgMinMaxCounter}} and {{SimpleCounter}} classes. This code is designed to make it incredibly easy to add new metrics. To add a new metric you just add one line to {{ServerMetrics}} and then directly reference that new metric anywhere in the code base. The {{ServerMetrics}} logic handles creating the metric, properly adding the metric to the JSON output of the {{/monitor}} admin command, and properly resetting the metric when necessary. The motivation behind {{ServerMetrics}} is to make things easy enough that it encourages new metrics to be added liberally. Lack of in-depth metrics/visibility is a long-standing ZooKeeper weakness. At Facebook, most of our internal changes build on {{ServerMetrics}} and we have nearly 100 internal metrics at this time – all of which we'll be upstreaming in the coming months as we publish more internal patches.
# This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
# This patch replaces some uses of {{synchronized}} in {{ServerStats}} with atomic operations.

Here's a list of new metrics added in this patch:
- {{uptime}}: time that a peer has been in a stable leading/following/observing state
- {{leader_uptime}}: uptime for peer in leading state
- {{global_sessions}}: count of global sessions
- {{local_sessions}}: count of local sessions
- {{quorum_size}}: configured ensemble size
- {{synced_observers}}: similar to existing `synced_followers` but for observers
- {{fsynctime}}: time to fsync transaction log (avg/min/max)
- {{snapshottime}}: time to write a snapshot (avg/min/max)
- {{dbinittime}}: time to reload database – read snapshot + apply transactions (avg/min/max)
- {{readlatency}}: read request latency (avg/min/max)
- {{updatelatency}}: update request latency (avg/min/max)
- {{propagation_latency}}: end-to-end latency for updates, from proposal on leader to committed-to-datatree on a given host (avg/min/max)
- {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
- {{election_time}}: time between entering and leaving election (avg/min/max)
- {{looking_count}}: number of transitions into looking state
- {{diff_count}}: number of diff syncs performed
- {{snap_count}}: number of snap syncs performed
- {{commit_count}}: number of commits performed on leader
- {{connection_request_count}}: number of incoming client connection requests
- {{bytes_received_count}}: similar to existing `packets_received` but tracks bytes
100% 100% 21600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 26 weeks, 3 days ago 0|i3w4mn:
ZooKeeper ZOOKEEPER-3097

Use Runnable instead of Thread for working items in WorkerService to improve the throughput of CommitProcessor

Improvement Closed Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 20/Jul/18 15:13   20/May/19 13:50 26/Jul/18 23:13 3.6.0 3.6.0, 3.5.5 server   0 3 0 1800   CommitProcessor is using this to submit read/write tasks, each task is initialized as a thread, which is heavy, change it to a lighter Runnable object to avoid the overhead of initializing the thread, it shows promised improvement in the CommitProcessor. 100% 100% 1800 0 performance, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 33 weeks, 6 days ago 0|i3w4l3:
ZooKeeper ZOOKEEPER-3096

Leader should not leak LearnerHandler threads

Bug Open Major Unresolved Michael Han Michael Han Michael Han 20/Jul/18 14:45   20/Jul/18 14:45   3.5.4, 3.6.0, 3.4.13   quorum, server   0 2   Currently we don't track LearnerHandler threads in leader; we rely on the socket timeout to raise an exception and use that exception as a signal to let the LearnerHandler thread kills itself. In cases where the learners restarts, if the time between restart beginning to finishing is less than the socket timeout value (currently hardcoded as initLimit * tickTime), then there will be no exception raised and the previous LearnerHandler thread corresponding to this learner will leak.

I have a test case and a proposed fix which I will submit later.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 34 weeks, 6 days ago 0|i3w4in:
ZooKeeper ZOOKEEPER-3095

Connect string fix for non-existent hosts

Improvement Resolved Minor Fixed Mohamed Jeelani Mohamed Jeelani Mohamed Jeelani 20/Jul/18 14:00   21/Feb/20 17:28 27/Jul/18 22:47 3.4.0 3.6.0 other   0 2 0 6000   Connect string fix for non-existent hosts 100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 33 weeks, 5 days ago 0|i3w4ev:
ZooKeeper ZOOKEEPER-3094

Make BufferSizeTest reliable

Improvement Closed Minor Fixed Mohamed Jeelani Mohamed Jeelani Mohamed Jeelani 20/Jul/18 13:41   02/Apr/19 06:40 26/Jul/18 23:29 3.4.0 3.6.0, 3.5.5, 3.4.14 tests   0 2 0 1800   Improve reliability of BufferSizeTest. 

Changes made to the testStartupFailure test to remember the old directory and switch back to it after the test has completed.
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 33 weeks, 6 days ago 0|i3w4e7:
ZooKeeper ZOOKEEPER-3093

sync zerror(int rc) with newest error definitions

Bug Closed Trivial Fixed Kent R. Spillner Kent R. Spillner Kent R. Spillner 18/Jul/18 15:52   20/May/19 13:51 19/Jul/18 13:32 3.5.4, 3.6.0 3.6.0, 3.5.5 c client   0 4 0 1800   Add missing #define -> string translations to zerror(int) 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks ago 0|i3w1fb:
ZooKeeper ZOOKEEPER-3092

Pluggable metrics system for ZooKeeper

New Feature Resolved Major Fixed Enrico Olivelli Michael Han Michael Han 17/Jul/18 18:09   07/Jun/19 08:20 07/Jun/19 08:20   3.6.0 metric system   1 4   ZOOKEEPER-3103, ZOOKEEPER-3123, ZOOKEEPER-3143, ZOOKEEPER-3366 ZooKeeper should provide a pluggable metrics system such that various metrics can be collected and reported using different approaches that fit production monitoring / alert / debugging needs.
Historically ZooKeeper provides four letter words and JMX which exposes certain stats / metrics but they are not very flexible in terms of programmatically accessing metrics and connecting metrics to different reporting systems.

There are other projects that's already doing this which can be used for reference, such as bookkeeper metrics service providers and hadoop metrics2.
100% 111000 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks, 1 day ago 0|i3vzz3:
ZooKeeper ZOOKEEPER-3091

Prometheus.io integration

New Feature Resolved Major Fixed Unassigned Hari Sekhon Hari Sekhon 17/Jul/18 09:38   16/Jun/19 05:15 12/Jun/19 04:25 3.4.6 3.6.0 jmx, metric system   0 4 0 28200   Feature Request to add Prometheus /metrics http endpoint for monitoring integration:

[https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E]

Prometheus metrics format for that endpoint:

[https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md]
100% 100% 28200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
40 weeks, 1 day ago 0|i3vz67:
ZooKeeper ZOOKEEPER-3090

change continue to break

Improvement Resolved Minor Fixed Unassigned zhangbo zhangbo 16/Jul/18 10:00   05/Sep/18 20:08 16/Jul/18 10:28     server   0 1 0 5400   it's useful and enough to change continue to break,especially when call getLogFiles(logDir.listFiles(), 0) 100% 100% 5400 0 pull-request-available 9223372036854775807 it's useful and enough to change continue to break,especially when call getLogFiles(logDir.listFiles(), 0) No Perforce job exists for this issue. 0 9223372036854775807
1 year, 28 weeks, 1 day ago change continue to break 0|i3vxnr:
ZooKeeper ZOOKEEPER-3089

ZOOKEEPER-3170 Flaky test:StaticHostProviderTest.testNextDoesNotSleepForZero

Sub-task Closed Minor Not A Problem Andor Molnar maoling maoling 15/Jul/18 08:46   25/Oct/18 10:35 25/Oct/18 10:32 3.4.12   tests   0 2   Flaky test:
https://builds.apache.org/job/ZooKeeper_branch34_java9/322/testReport/junit/org.apache.zookeeper.test/StaticHostProviderTest/testNextDoesNotSleepForZero/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 21 weeks ago 0|i3vwp3:
ZooKeeper ZOOKEEPER-3088

zk c client: should delete tsd keys in a destructor

Bug Open Critical Unresolved Unassigned kevinxw kevinxw 13/Jul/18 01:59   13/Jul/18 01:59   3.4.12   c client   0 2   crash  when unload libzookeeper_mt.so  by dlclose,     

the tsd keys should be deleted in a destructor in zk_log.c
{code:java}
__attribute__((destructor)) void deleteTSDKeys()
{
pthread_setspecific(time_now_buffer, NULL);
pthread_setspecific(format_log_msg_buffer, NULL);
pthread_key_delete(time_now_buffer);
pthread_key_delete(format_log_msg_buffer);
}
{code}
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks, 6 days ago 0|i3vutz:
ZooKeeper ZOOKEEPER-3087

Fix findbug warning introduced by ZOOKEEPER-3084.

Task Resolved Major Fixed Michael Han Michael Han Michael Han 12/Jul/18 23:50   13/Jul/18 06:48 13/Jul/18 06:48 3.6.0 3.6.0 tests   0 1 0 1200   Findbug complains after ZOOKEEPER-3084:

bq. DM_EXIT: Method invokes System.exit(...): Invoking System.exit shuts down the entire Java virtual machine. This should only been done when it is appropriate. Such calls make it hard or impossible for your code to be invoked by other code. Consider throwing a RuntimeException instead.

While in this case we should really quit so just make an exception for this case.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks, 6 days ago 0|i3vupz:
ZooKeeper ZOOKEEPER-3086

[server] Lack of write timeouts causes quorum to stuck

Bug Open Major Unresolved Unassigned Ruslan Nigmatullin Ruslan Nigmatullin 12/Jul/18 15:10   20/Jul/18 13:49   3.5.4, 3.4.12   quorum   0 6   Linux 4.13.0-32-generic, Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode) Network outage on leader host can cause `QuorumPeer` thread to stuck for prolonged period of time (2+ hours, depends on tcp keep alive settings). It effectively stalls the whole zookeeper server making it inoperable. We've found it during one of our internal DRTs (Disaster Recovery Test).

The scenario which triggers the behavior (requires relatively high ping-load to the follower):
# `Follower.processPacket` processes `Leader.PING` message
# Leader is network partitioned
# `Learner.ping` makes attempt to write to the leader socket
# If write socket buffer is full (due to other ping/sync calls) `Learner.ping` blocks
# As leader is partitioned - `Learner.ping` blocks forever due to lack of write timeout
# `QuorumPeer` is the only thread reading from the leader socket, effectively meaning that the whole server is stuck and can't recover without manual process restart.

 

Thread dump from the affected server is in attachments.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 34 weeks, 6 days ago 0|i3vu6v:
ZooKeeper ZOOKEEPER-3085

Define constant exit code and add documents

Improvement Resolved Minor Fixed Norbert Kalmár Fangmin Lv Fangmin Lv 09/Jul/18 15:32   06/Aug/18 08:49 06/Aug/18 08:11   3.6.0 server   0 3 0 10200   There are various hard coded exit code in different places of ZooKeeper, which makes it's harder to track. Also we need to add some documents to make it easier to understand what happened when it's exited. 100% 100% 10200 0 newbie, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks, 3 days ago 0|i3vown:
ZooKeeper ZOOKEEPER-3084

Exit when ZooKeeper cannot bind to the leader election port

Improvement Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 06/Jul/18 01:53   13/Jul/18 00:29 11/Jul/18 00:06   3.6.0 quorum, server   0 6 0 6000   In QuorumCnxManager, the listener thread will exit if it cannot bind to the election port after trying 3 times. Which will keep the server running but unable to join a quorum, the process will be dangling there and only rejecting requests. It seems it's better to exit instead of sitting there doing nothing. 100% 100% 6000 0 easyfix, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks, 6 days ago 0|i3vlmv:
ZooKeeper ZOOKEEPER-3083

Remove some redundant and noisy log lines

Improvement Closed Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 05/Jul/18 19:29   20/May/19 13:51 18/Jul/18 23:59 3.6.0 3.6.0, 3.5.5 server   0 2 0 12600   Under high client turnover, some log lines around client activity generate an outsized amount of noise in the log files. Reducing a few to debug level won't cause a big hit on admin understanding as there are redundant elements. 100% 100% 12600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks ago 0|i3vli7:
ZooKeeper ZOOKEEPER-3082

Fix server snapshot behavior when out of disk space

Bug Resolved Minor Fixed Brian Nixon Brian Nixon Brian Nixon 05/Jul/18 14:53   04/Oct/19 10:55 30/Jul/18 00:23 3.6.0, 3.4.12, 3.5.5 3.6.0 server   0 6 0 4800   When the ZK server tries to make a snapshot and the machine is out of disk space, the snapshot creation fails and throws an IOException. An empty snapshot file is created, (probably because the server is able to create an entry in the dir) but is not able to write to the file.
 
If snapshot creation fails, the server commits suicide. When it restarts, it will do so from the last known good snapshot. However, when it tries to make a snapshot again, the same thing happens. This results in lots of empty snapshot files being created. If eventually the DataDirCleanupManager garbage collects the good snapshot files then only the empty files remain. At this point, the server is well and truly screwed.
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks ago 0|i3vl5b:
ZooKeeper ZOOKEEPER-3081

DELETE - test hello

Bug Open Minor Unresolved Unassigned pacawat k pacawat k 05/Jul/18 10:40   21/Aug/18 12:53       contrib-bindings, contrib-fatjar   0 2   this is the test for our project 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 37 weeks ago 0|i3vkv3:
ZooKeeper ZOOKEEPER-3080

ZOOKEEPER-3021 Step 1.5 - Separate jute structure

Sub-task Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 04/Jul/18 11:17   02/Apr/19 06:40 04/Sep/18 10:04 3.6.0 3.6.0, 3.5.5, 3.4.14 build, scripts   0 2 0 3000   Create a project structure that separates the different parts of ZooKeeper into a more meaningful packages for the future maven build.

This should be done in iterations to limit the impact.

* First iteration - safe changes including moving src/docs to zk-docs, creating zk-it empty directory. Build and conf directory remains unchanged. These changes also have minimum impact on PR’s.
* Second iteration - move src/recipes to zk-recipes.
* Third iteration - move src/contrib to zk-contrib.
* Fourth iteration - move src/c to zk-client (java will be moved in Phase 2)

* *Fifth iteration* - move jute under src directory

* Sixth iteration - move src/java/main to zk-server, which will be further separated in Step 2.

{noformat}
zookeeper
| -bin
| -conf
| -jute
| -zookeeper-client
| | -zookeeper-client-c
| -zookeeper-contrib
| | -zookeeper-contrib-fatjar
| | -zookeeper-contrib-huebrowser
| | -zookeeper-contrib-loggraph
| | -zookeeper-contrib-monitoring
| | -zookeeper-contrib-rest
| | -zookeeper-contrib-zkfuse
| | -zookeeper-contrib-zkperl
| | -zookeeper-contrib-zkpython
| | -zookeeper-contrib-zktreeutil
| \ -zookeeper-contrib-zooinspector
| -zookeeper-docs
| -zookeeper-it (integration tests)
| -zookeeper-server
| -zookeeper-recipes
| | -zookeeper-recipes-election
| | -zookeeper-recipes-lock
\ \ -zookeeper-recipes-queue

{noformat}
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 28 weeks, 2 days ago 0|i3vjr3:
ZooKeeper ZOOKEEPER-3079

Fix unsafe use of sprintf(3) for creating IP address strings

Bug Resolved Minor Fixed Kent R. Spillner Kent R. Spillner Kent R. Spillner 03/Jul/18 16:41   10/Jul/18 15:55 10/Jul/18 07:04 3.5.4 3.6.0 c client   0 4 0 3600   The function format_endpoint_info in zookeeper.c causes compiler errors when building with GCC 8 due to a potentially unsafe use of sprintf(3). 100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 36 weeks, 2 days ago 0|i3vilz:
ZooKeeper ZOOKEEPER-3078

Remove unused print_completion_queue function

Improvement Resolved Trivial Fixed Kent R. Spillner Kent R. Spillner Kent R. Spillner 03/Jul/18 16:13   10/Jul/18 15:54 10/Jul/18 07:05 3.5.4 3.6.0 c client   0 2 0 1800   The function print_completion_queue in zookeeper.c causes compilation errors with GCC 8.  However, this function is unused and can safely be removed. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 36 weeks, 2 days ago 0|i3vikn:
ZooKeeper ZOOKEEPER-3077

Build native C library outside of source directory

Improvement Closed Trivial Fixed Kent R. Spillner Kent R. Spillner Kent R. Spillner 03/Jul/18 13:36   02/Apr/19 06:40 19/Jul/18 13:16 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build   0 2 0 3000   Allow building native C library outside of source directory.  Everything works out-of-the-box with the existing autoconf infrastructure, except the location of the generated jute header & source is relative to the build directory. 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks ago 0|i3vicv:
ZooKeeper ZOOKEEPER-3076

DELETE - test

Bug Open Major Unresolved Unassigned Jitkanya Tiawsawat Jitkanya Tiawsawat 03/Jul/18 11:14   21/Aug/18 12:49           0 2   Click the 'New Task' icon in toolbar of the Task List. The 'New Task' wizard will display.
Choose your repository (e.g. 'JAC') from the list of repositories.
Select Next to and choose a project from the list of projects.
Select Finish to open the editor for entering task details.
Click the 'New Task' icon in toolbar of the Task List. The 'New Task' wizard will display.
Choose your repository (e.g. 'JAC') from the list of repositories.
Select Next to and choose a project from the list of projects.
Select Finish to open the editor for entering task details.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 37 weeks, 2 days ago 0|i3vi53:
ZooKeeper ZOOKEEPER-3075

Cannot run cppunit tests on branch 3.4 on Fedora 26

Bug Open Critical Unresolved Unassigned Enrico Olivelli Enrico Olivelli 03/Jul/18 04:54   03/Jul/18 04:54   3.4.13   c client, tests   0 2   Fedora 26

java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)

 

ant -version
Apache Ant(TM) version 1.9.7 compiled on April 9 2016

Cppunit:

{color:#000000}cppunit-devel x86_64 1.13.2-3.fc26{color}
{color:#000000}cppunit x86_64 1.13.2-3.fc26{color}

{color:#000000}gcc --version
gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2){color}
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTIN7CppUnit23TestSuiteBuilderContextI18Zookeeper_watchersEE[_ZTIN7CppUnit23TestSuiteBuilderContextI18Zookeeper_watchersEE]+0x10): undefined reference to `typeinfo for CppUnit::TestSuiteBuilderContextBase'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTIN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTIN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x10): undefined reference to `typeinfo for CppUnit::TestCase'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x20): undefined reference to `CppUnit::TestCase::run(CppUnit::TestResult*)'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x28): undefined reference to `CppUnit::TestLeaf::countTestCases() const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x30): undefined reference to `CppUnit::TestLeaf::getChildTestCount() const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x38): undefined reference to `CppUnit::Test::getChildTestAt(int) const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x40): undefined reference to `CppUnit::TestCase::getName[abi:cxx11]() const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x48): undefined reference to `CppUnit::Test::findTestPath(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, CppUnit::TestPath&) const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x50): undefined reference to `CppUnit::Test::findTestPath(CppUnit::Test const*, CppUnit::TestPath&) const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x58): undefined reference to `CppUnit::Test::findTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x60): undefined reference to `CppUnit::Test::resolveTestPath(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x68): undefined reference to `CppUnit::Test::checkIsValidIndex(int) const'
     [exec] zktest_st-TestWatchers.o:(.rodata._ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE[_ZTVN7CppUnit10TestCallerI18Zookeeper_watchersEE]+0x70): undefined reference to `CppUnit::TestLeaf::doGetChildTestAt(int) const'
     [exec] zktest_st-LibCSymTable.o: In function `LibCSymTable::LibCSymTable()':
     [exec] /xxx/zookeeper-3.4.13/src/c/tests/LibCSymTable.cc:36: undefined reference to `dlsym'
     [exec] /xxx/zookeeper-3.4.13/src/c/tests/LibCSymTable.cc:37: undefined reference to `dlsym'
     [exec] /xxx/zookeeper-3.4.13/src/c/tests/LibCSymTable.cc:38: undefined reference to `dlsym'
     [exec] /xxx/zookeeper-3.4.13/src/c/tests/LibCSymTable.cc:39: undefined reference to `dlsym'
     [exec] /xxxzookeeper-3.4.13/src/c/tests/LibCSymTable.cc:40: undefined reference to `dlsym'
     [exec] zktest_st-LibCSymTable.o:/xxxzookeeper-3.4.13/src/c/tests/LibCSymTable.cc:41: more undefined references to `dlsym' follow
     [exec] collect2: error: ld returned 1 exit status
     [exec] make[1]: *** [Makefile:822: zktest-st] Error 1
     [exec] make[1]: uscita dalla directory "/xxxzookeeper-3.4.13/build/test/test-cppunit"
     [exec] make: *** [Makefile:1718: check-am] Error 2

BUILD FAILED
/xxx/zookeeper-3.4.13/build.xml:1471: The following error occurred while executing this line:
/xxx/zookeeper-3.4.13/build.xml:1481: exec returned: 2
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 37 weeks, 2 days ago 0|i3vhhb:
ZooKeeper ZOOKEEPER-3074

Flaky test:org.apache.zookeeper.server.ServerStatsTest.testLatencyMetrics

Test Closed Minor Fixed maoling maoling maoling 03/Jul/18 03:34   20/May/19 13:50 18/Jul/18 09:49   3.6.0, 3.5.5 tests 03/Jul/18 0 2 0 10200   Jenkins complains about this Flaky test: https://builds.apache.org/job/ZooKeeper-trunk/77/testReport/junit/org.apache.zookeeper.server/ServerStatsTest/testLatencyMetrics/ 100% 100% 10200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks, 1 day ago 0|i3vhc7:
ZooKeeper ZOOKEEPER-3073

fix couple of typos

Wish Resolved Minor Fixed Christine Poerschke Christine Poerschke Christine Poerschke 29/Jun/18 12:51   10/Jul/18 15:54 10/Jul/18 07:30   3.6.0     0 2 0 1800   Saw a number of open pull requests concerning typos but without associated JIRA ticket and so here taking the opportunity to gather them up (where not already otherwise taken care of) plus couple of additions I noticed whilst my other code was doing its compiling-and-testing thing. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 36 weeks, 2 days ago 0|i3ve6v:
ZooKeeper ZOOKEEPER-3072

Race condition in throttling

Bug Resolved Major Fixed Botond Hejj Botond Hejj Botond Hejj 29/Jun/18 10:00   24/Nov/18 14:58 27/Jul/18 22:42 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4 3.5.4, 3.6.0 server   0 4 0 9600   There is a race condition in the server throttling code. It is possible that the disableRecv is called after enableRecv.

Basically, the I/O work thread does this in processPacket: [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102] 

                submitRequest(si);

            }

        }

        cnxn.incrOutstandingRequests(h);

    }

 

incrOutstandingRequests() checks for limit breach, and potentially turns on throttling, [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]

 

submitRequest() will create a logical request and en-queue it so that Processor thread can pick it up. After being de-queued by Processor thread, it does necessary handling, and then calls this [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459] :

 

            cnxn.sendResponse(hdr, rsp, "response");

 

and in sendResponse(), it first appends to outgoing buffer, and then checks if un-throttle is needed:  [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]

 

However, if there is a context switch between submitRequest() and cnxn.incrOutstandingRequests(), so that Processor thread completes cnxn.sendResponse() call before I/O thread switches back, then enableRecv() will happen before disableRecv(), and enableRecv() will fail the CAS ops, while disableRecv() will succeed, resulting in a deadlock: un-throttle is needed for letting in requests, and sendResponse is needed to trigger un-throttle, but sendResponse() requires an incoming message. From that point on, ZK server will no longer select the affected client socket for read, leading to the observed client-side failure in the subject.

If you would like to reproduce this than setting the globalOutstandingLimit down to 1 makes this reproducible easier as throttling starts with less requests. 

 
100% 100% 9600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 33 weeks, 5 days ago 0|i3vdyn:
ZooKeeper ZOOKEEPER-3071

Add a config parameter to control transaction log size

Improvement Resolved Minor Fixed Suyog Mapara Suyog Mapara Suyog Mapara 28/Jun/18 15:55   12/Nov/18 20:14 12/Nov/18 16:36 3.6.0 3.6.0 server   0 2 0 33600   Currently we only have a knob to control maximum number of transactions in the log file but there is no direct way to control actual size of the file. This has implications on the time it takes for a learner to sync using transaction log as the leader needs to seek the file to find the appropriate transaction. This is a proposal for adding a config parameter to control the transaction log size. 100% 100% 33600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 18 weeks, 2 days ago 0|i3vcof:
ZooKeeper ZOOKEEPER-3070

Not Able to Change Zookeeper Logging via JMX Call

Improvement Resolved Blocker Not A Problem Unassigned jahar jahar 28/Jun/18 02:38   28/Jun/18 03:30 28/Jun/18 03:30 3.4.5 3.4.5 jmx   0 1   Using Java 8 for writing standalone code to update the MBeans in zookeeper which is running in Windows machine for POC purpose.

Zookeeper Version is : 3.4.5
Hi,

I wanted to change the logging level of zookeeper dynamically via a JMX call programmatically. Apache Zookeeper official page specifies that it is possible to change the Mbeans via JMX calls and I have verified this through JConsole also.

!zkpg.JPG!

But the problem is that I am not able to update the Mbeans related to log4j through my code. I do see an API which can be used to access the Mbeans related to Object "org.apache.ZooKeeperService:name0=StandaloneServer_port-1" below is the screengrab of Jconsole and my code:

!jconsole.JPG!

 

Here goes my Code:
{quote}public static void main(String[] args) throws Exception
{
JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://localhost:2167/jmxrmi");
JMXConnector jmxConnector = JMXConnectorFactory.connect(url);
MBeanServerConnection mbeanServerConnection = jmxConnector.getMBeanServerConnection();
ObjectName mbeanName = new ObjectName("org.apache.ZooKeeperService:name0=StandaloneServer_port-1");
ZooKeeperServerMXBean newProxyInstance = MBeanServerInvocationHandler.newProxyInstance(mbeanServerConnection,
mbeanName, ZooKeeperServerMXBean.class, true);
System.out.println(newProxyInstance.getClientPort());
}
{quote}
 

 

I dont see any API which can be used to access and update the log4J Mbeans e.g. "root". What I want to achieve is to update the logging of zookeeper without taking a restart.

Please advice if some API is exposed to achieve this. 
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
1 year, 38 weeks ago 0|i3vbk7:
ZooKeeper ZOOKEEPER-3069

document: is mutual auth with DIGEST-MD5 insecure?

Bug Open Minor Unresolved Unassigned Jan Zerebecki Jan Zerebecki 25/Jun/18 13:12   28/Jun/18 07:28       documentation   0 2   The [documentation regarding mutual ZooKeeper server to server authentication with DIGEST-MD5|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication#Server-Servermutualauthentication-DIGEST-MD5basedauthentication] currently doesn't mention whether this is insecure. [DIGEST-MD5 was declared obsolete in 2011 due to security problems.|https://tools.ietf.org/html/rfc6331]

This is in relation to whether this is an effective mitigation for CVE-2018-8012 AKA ZOOKEEPER-1045, as mentioned in [https://lists.apache.org/thread.html/c75147028c1c79bdebd4f8fa5db2b77da85de2b05ecc0d54d708b393@%3Cdev.zookeeper.apache.org%3E].

Would the following be a fitting addition to the documentation?:

DIGEST-MD5 based authentication should not be relied on for authentication as it is insecure, it is only provided for test purposes.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 38 weeks ago 0|i3v7d3:
ZooKeeper ZOOKEEPER-3068

Improve C client logging of IPv6 hosts

Improvement Resolved Trivial Fixed Brian Nixon Brian Nixon Brian Nixon 22/Jun/18 14:18   20/Jul/18 13:02 20/Jul/18 11:41 3.6.0 3.6.0 c client   0 2 0 6600   The C client formats host-port pairings as [host:port] when logging. This is visually confusing when the host is an IPv6 address (see the below). In that case, it would be preferable to cleanly separate the IPv6 from the port.
{code:java}
ZOO_INFO@check_events@2736: initiated connection to server [2401:db00:1020:40bf:face:0:5:0:2181]{code}
100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 34 weeks, 6 days ago 0|i3v587:
ZooKeeper ZOOKEEPER-3067

Optionally suppress client environment logging.

Task Resolved Minor Fixed James Peach James Peach James Peach 20/Jun/18 13:25   24/Nov/18 14:59 27/Jul/18 06:40   3.6.0 c client   0 3 0 6600   It would be helpful to add a {{zookeeper_init}} flag to suppress the client environment logging. In our deployment, this causes LDAP lookups for the current user ID, which is otherwise an unnecessary service dependency for ZooKeeper clients. 100% 100% 6600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 33 weeks, 6 days ago 0|i3v26n:
ZooKeeper ZOOKEEPER-3066

Expose on JMX of Followers the id of the current leader

New Feature Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 19/Jun/18 09:46   20/May/19 13:50 26/Jun/18 05:19 3.5.4, 3.6.0 3.6.0, 3.5.5 jmx, leaderElection, quorum   0 3 0 22800   It will be useful to add to JMX beans published on Follower Peers to have an information about the current "leader".

This information is only available using 4 letter words
100% 100% 22800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 38 weeks, 2 days ago 0|i3v0h3:
ZooKeeper ZOOKEEPER-3065

Refactor existing reconfig tests in StaticHostProviderTest

Test Open Minor Unresolved Andor Molnar Andor Molnar Andor Molnar 18/Jun/18 08:52   14/Dec/19 06:08   3.6.0 3.7.0 tests   0 1   The following issues would be nice to address:
* Tests cover addresses with IP addresses only, a few of them test unresolved hostnames, but ideal would be to create Test parameters and run all tests for both cases,
* Test methods should be split into multiple to cover one test case / test method,
* Style: instead of assertTrue(a < b), we should use assertThat(b, greaterThan(a))
* Extract redundant code snippets into methods
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 39 weeks, 3 days ago 0|i3uyyf:
ZooKeeper ZOOKEEPER-3064

Format overflow warning when building with GCC 8.1

Task Open Minor Unresolved Unassigned James Peach James Peach 15/Jun/18 12:27   15/Jun/18 12:31           0 1   Building ZK 3.4.8 with gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)

{noformat}
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o zookeeper.o
...
src/zookeeper.c: In function ‘format_endpoint_info’:
src/zookeeper.c:3504:21: error: ‘%d’ directive writing between 1 and 5 bytes into a region of size between 0 and 127 [-Werror=format-overflow=]
sprintf(buf,"%s:%d",addrstr,ntohs(port));
^~
src/zookeeper.c:3504:17: note: directive argument in the range [0, 65535]
sprintf(buf,"%s:%d",addrstr,ntohs(port));
^~~~~~~
src/zookeeper.c:3504:5: note: ‘sprintf’ output between 3 and 134 bytes into a destination of size 128
sprintf(buf,"%s:%d",addrstr,ntohs(port));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{noformat}

Looks like gcc wants [format_endpoint_info|https://github.com/apache/zookeeper/blob/master/src/c/src/zookeeper.c#L4357] to use snprintf.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 39 weeks, 6 days ago 0|i3ux8v:
ZooKeeper ZOOKEEPER-3063

Track outstanding changes with ArrayDeque

Improvement Closed Trivial Fixed Yisong Yue Yisong Yue Yisong Yue 13/Jun/18 16:44   20/May/19 13:50 15/Jun/18 01:41   3.6.0, 3.5.5 server   0 2 0 1200   Outstanding changes are tracked with an ArrayList, which has O(N) remove from head (and possibly add) performance. This means that as we get further behind, we will slow down the processing of outstanding changes, which would make us get further behind.
We should switch to using ArrayDeque which achieves O(1) add and remove on both ends, which should result in much happiness.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 39 weeks, 6 days ago 0|i3utzz:
ZooKeeper ZOOKEEPER-3062

introduce fsync.warningthresholdms constant for FileTxnLog LOG.warn message

Task Closed Minor Fixed Christine Poerschke Christine Poerschke Christine Poerschke 13/Jun/18 09:45   04/Oct/19 10:55 01/Aug/18 15:23 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14     0 4 0 4800   The
{code}
fsync-ing the write ahead log in ... took ... ms which will adversely effect operation latency. File size is ... bytes. See the ZooKeeper troubleshooting guide
{code}
warning mentioning the {{fsync.warningthresholdms}} configurable property would make it easier to discover and also when interpreting historical vs. current logs or logs from different ensembles then differences in configuration would be easier to spot.
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
1 year, 33 weeks ago
Reviewed
0|i3utfz:
ZooKeeper ZOOKEEPER-3061

add more details to 'Unhandled scenario for peer' log.warn message

Task Resolved Minor Fixed Christine Poerschke Christine Poerschke Christine Poerschke 13/Jun/18 09:00   24/Nov/18 14:58 27/Jul/18 22:33   3.6.0     0 3 0 2400   A few lines earlier the {{LOG.info("Synchronizing with Follower sid: ...}} logging already contains most relevant details but it would be convenient to more directly have full details in the {{LOG.warn("Unhandled scenario for peer sid: ...}} itself. 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 33 weeks, 5 days ago 0|i3utbz:
ZooKeeper ZOOKEEPER-3060

Logging the server local port to stderr

Improvement Resolved Minor Not A Problem Mohamed Jeelani Mohamed Jeelani Mohamed Jeelani 12/Jun/18 15:40   26/Jun/18 13:46 26/Jun/18 13:46 3.4.12   server   1 2 0 2400   This simple straightforward patch adds logging of the server local port to stderr which aids in simplifying debugging if you want to have to look that up 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 38 weeks, 2 days ago 0|i3usb3:
ZooKeeper ZOOKEEPER-3059

EventThread leak in case of Sasl AuthFailed

Bug Closed Critical Fixed Abhishek Singh Chouhan Abhishek Singh Chouhan Abhishek Singh Chouhan 11/Jun/18 13:21   04/Oct/19 10:55 25/Jun/18 06:31 3.4.12 3.6.0, 3.5.5     0 5 0 7800   In case of an authFailed sasl event we shutdown the send thread however we never close the event thread. Even if the client tries to close the connection it results in a no-op since we check for cnxn.getState().isAlive() which results in negative for auth failed state and we return without cleaning up. For applications that retry in case of auth failed by closing the existing connection and then trying to reconnect(eg. hbase replication) this eventually ends up exhausting the system resources. 100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 38 weeks, 3 days ago 0|i3uqkn:
ZooKeeper ZOOKEEPER-3058

Do length check first before actual byte check in compareBytes method of Utils class

Improvement Open Minor Unresolved Unassigned Hosur Narahari Hosur Narahari 08/Jun/18 12:23   02/Jul/18 09:49       jute   0 1 0 3000   In compareBytes method of org.apache.jute.Utils class, all the individual bytes of 2 byte arrays are compared and then their lengths are compared. We can improve the performance by first having length check, since we can rule out that they aren't equal by a single if condition(O(1) operation) rather than looping through arrays(O( n ) operation). 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 40 weeks, 6 days ago 0|i3uo1r:
ZooKeeper ZOOKEEPER-3057

Fix IPv6 literal usage

Bug Closed Minor Resolved Mohamed Jeelani Mohamed Jeelani Mohamed Jeelani 06/Jun/18 20:37   14/Feb/20 10:23 13/Oct/18 10:00 3.4.12 3.6.0, 3.5.7 other   0 7 0 30600   IPv6 literals are not parsed correctly and can lead to potential errors if not be an eye sore. Need to parse and display them correctly. 100% 100% 30600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks, 3 days ago 0|i3ulqn:
ZooKeeper ZOOKEEPER-3056

Fails to load database with missing snapshot file but valid transaction log file

Bug Closed Critical Fixed Michael Han Michael Han Michael Han 05/Jun/18 11:03   23/Dec/19 10:52 03/Sep/19 02:54 3.5.3, 3.5.4 3.6.0, 3.5.6 server   1 19 0 21000   [An issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f1472cccc1a93b4b@%3Cdev.zookeeper.apache.org%3E] was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing snapshot file.

The code complains about missing snapshot file is [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206] which is introduced as part of ZOOKEEPER-2325.

With this check, ZK will not load the db without a snapshot file, even the transaction log files are present and valid. This could be a problem for restoring a ZK instance which does not have a snapshot file but have a sound state (e.g. it crashes before being able to take the first snap shot with a large snapCount parameter configured).

 

*how to use this fix*

Add zookeeper.snapshot.trust.empty=true to your server configuration file and start the server.

This property will skip the check.

It is recommended to remove the property once you have a working server, because that check is important to ensure that the system is in good shape
100% 100% 21000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
14 weeks ago 0|i3ujan:
ZooKeeper ZOOKEEPER-3055

Unable to connect to remote host: Connection refused

Bug Open Minor Unresolved Unassigned Remil Remil 02/Jun/18 12:27   12/Jun/18 07:56           0 2   hadoopuser@sherin-VirtualBox:~$ sudo su -p - zookeeper -c "/usr/local/zookeeper/zookeeper-3.4.12/bin/zkServer.sh start" ZooKeeper JMX enabled by default
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
hadoopuser@sherin-VirtualBox:~$ telnet localhost 2181
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
hadoopuser@sherin-VirtualBox:~$

 

hadoopuser@sherin-VirtualBox:~$ telnet localhost 127.0.0.1:2181
telnet: could not resolve localhost/127.0.0.1:2181: Servname not supported for ai_socktype

 

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 40 weeks, 2 days ago 0|i3ufpz:
ZooKeeper ZOOKEEPER-3054

ipv6 detection in configure.ac is broken

Bug Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 29/May/18 19:40   29/May/18 19:40   3.6.0, 3.5.5   c client, tests   0 0   When I run the test using (jdk8 tag):
https://hub.docker.com/r/phunt/zk-docker-devenv.ubuntu.14.04/tags/
it fails with an IPV6 failure. afaict the container does not have ipv6 configured, although the kernel has it available as a feature. I believe this to be the real issue - it's a kernel feature but no available in the runtime.
newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 42 weeks, 2 days ago 0|i3uacn:
ZooKeeper ZOOKEEPER-3053

add remove watches capabilities to the c cli

Bug Open Major Unresolved Balazs Meszaros Patrick D. Hunt Patrick D. Hunt 29/May/18 19:37   19/Mar/19 09:08   3.6.0, 3.5.5   c client, tests   0 0   It would be good to be able to exercise the remove watches functionality from the c client cli. Mostly for testing purposes. newbie, remove_watches 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 42 weeks, 2 days ago 0|i3uac7:
ZooKeeper ZOOKEEPER-3052

testReadOnly fails on slow host

Bug Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 29/May/18 19:36   05/Feb/20 07:16   3.6.0, 3.5.5 3.7.0, 3.5.8 c client, tests   0 0   When running on a slow host (docker ubuntu on mac) the "sleep(3)" in tests/zkServer.sh is not sufficient wait for the server to enter RO mode. Recommend adding a "isro" 4lw check in the script to wait until the server is in RO mode. If this takes longer than 60 seconds the zkServer.sh should fail.

For more background see the comment here:
https://github.com/apache/zookeeper/pull/522#issuecomment-392980087
newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 42 weeks, 2 days ago 0|i3uabz:
ZooKeeper ZOOKEEPER-3051

owasp complaining about jackson version used

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 21/May/18 15:55   20/May/19 13:50 22/May/18 23:34 3.5.4, 3.6.0 3.6.0, 3.5.5 server   0 2 0 1800   The owasp target is complaining about jackson version. We should update to the latest. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks, 1 day ago 0|i3tyrb:
ZooKeeper ZOOKEEPER-3050

owasp ant target is highlighting jetty version needs to be updated

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 21/May/18 15:25   20/May/19 13:50 22/May/18 00:37 3.5.4, 3.6.0 3.6.0, 3.5.5 server   0 2 0 1200   The owasp target highlights that we need to update to new jetty version. 100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks, 2 days ago 0|i3typj:
ZooKeeper ZOOKEEPER-3049

would zookeeper transaction(multi) block the concurrent read?

Wish Open Major Unresolved Unassigned wayne wayne 19/May/18 23:25   24/May/18 12:43       documentation   0 3   For instance, the original data for znode1 and znode2 are 2 and 4 respectively. I want to perform increment operations over them. Finally, I would get (3, 5) for znode1 and znode2. In order to keep atomicity, I used multi() api. Is there any possibility that any clients could read (3, 4) concurrently? That is, the read happened after znode1++ and before znode2++? 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks ago 0|i3tx7z:
ZooKeeper ZOOKEEPER-3048

ZOOKEEPER-3170 Track OutOfMemory failures on Flaky Dashboard

Sub-task Open Minor Unresolved Bogdan Kanivets Bogdan Kanivets Bogdan Kanivets 19/May/18 21:00   15/Oct/18 06:26       build-infrastructure, tests   0 1   Flaky Tests [Dashboard|https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html] should track which tests failed because of OutOfMemory exception.

Related issue: [ZOOKEEPER-3044|https://issues.apache.org/jira/browse/ZOOKEEPER-3044]
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks, 4 days ago 0|i3tx73:
ZooKeeper ZOOKEEPER-3047

ZOOKEEPER-3170 flaky test LearnerSnapshotThrottlerTest

Sub-task Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 19/May/18 19:49   21/Nov/18 21:26   3.5.4, 3.6.0, 3.4.12   tests   0 2   * LearnerSnapshotThrottlerTest is flakey - failed during a clover run

{noformat}
2018-05-19 13:39:24,510 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testHighContentionWithTimeout
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at org.apache.zookeeper.server.quorum.LearnerSnapshotThrottlerTest.__CLR4_2_1a5fyaprev(LearnerSnapshotThrottlerTest.java:216)
{noformat}
flaky, newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks ago 0|i3tx67:
ZooKeeper ZOOKEEPER-3046

ZOOKEEPER-3170 testManyChildWatchersAutoReset is flaky

Sub-task Closed Minor Fixed Bogdan Kanivets Bogdan Kanivets Bogdan Kanivets 17/May/18 17:46   20/May/19 13:50 11/Mar/19 16:57 3.5.3, 3.4.12 3.6.0, 3.5.5, 3.4.15 tests   0 5 0 13200   According to the [dashboard|https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html] testManyChildWatchersAutoReset is flaky in 3.4 and 3.5

[ZooKeeper_branch34_java10|https://builds.apache.org/job/ZooKeeper_branch34_java10//13]

[ZooKeeper_branch35_java9|https://builds.apache.org/job/ZooKeeper_branch35_java9/253]

Test times out and because of that ant doesn't capture any output.
100% 100% 13200 0 flaky, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 1 week, 2 days ago
Reviewed
0|i3tuwv:
ZooKeeper ZOOKEEPER-3045

NullPointerException and continuous EndOfStreamException warnings in ZooKeeper stdout after stop and start of ZooKeeper

Bug Open Major Unresolved Unassigned Zeynep Arikoglu Zeynep Arikoglu 17/May/18 08:43   19/Aug/18 08:00   3.4.10   server   0 2   Ubuntu 16 and Centos 7 After stopping and starting the ZooKeeper stdout of the ZooKeeper is sporadically polluted with EndOfStreamException warnings. As it can be seen from the attached output the warnings are outputted in 0.2 millisecond intervals. This goes on until the ZooKeeper is stopped. If we are dumping the output to a file this fills up the storage immediately. 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 30 weeks, 4 days ago 0|i3ttxj:
ZooKeeper ZOOKEEPER-3044

OutOfMemoryError exceptions in Jenkins when running tests

Improvement Resolved Major Fixed Patrick D. Hunt Bogdan Kanivets Bogdan Kanivets 15/May/18 14:10   19/May/18 19:40 18/May/18 12:04 3.6.0 3.6.0 build-infrastructure, tests   0 1   I've spot checked some failing test results and noticed OutOfMemoryError on some of them

[trunk - java 10 - testQuorumSystemChange - build # 65|https://builds.apache.org/job/ZooKeeper-trunk-java10/65/testReport/junit/org.apache.zookeeper.test/ReconfigTest/testQuorumSystemChange]

[trunk - java 10 - testQuorumSystemChange - build # 69|https://builds.apache.org/job/ZooKeeper-trunk-java10/69/testReport/junit/org.apache.zookeeper.test/ReconfigTest/testQuorumSystemChange]

[trunk - java 9 - testWatcherAutoResetDisabledWithGlobal|https://builds.apache.org/job/ZooKeeper-trunk-java9/775/testReport/junit/org.apache.zookeeper.test/WatcherTest/testWatcherAutoResetDisabledWithGlobal]

[trunk - java 10 - testHammer|https://builds.apache.org/job/ZooKeeper-trunk-java10/70/testReport/junit/org.apache.zookeeper.test/AsyncHammerTest/testHammer]

Right now the test command is:
ant -Dtest.junit.maxmem=2g -Dtest.output=no -Dtest.junit.threads=8 -Dtest.junit.output.format=xml -Djavac.target=1.9 clean test-core-java

Is it possible to decrease the number of threads or increase maxmem?
 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks, 5 days ago 0|i3tqof:
ZooKeeper ZOOKEEPER-3043

QuorumKerberosHostBasedAuthTest fails on Linux box: Unable to parse:includedir /etc/krb5.conf.d/

Improvement Closed Major Fixed Enrico Olivelli Enrico Olivelli Enrico Olivelli 14/May/18 07:32   17/Jul/18 00:50 29/May/18 20:18 3.5.4, 3.6.0, 3.4.12 3.6.0, 3.4.13, 3.5.5 build, kerberos, tests   0 3 0 11400   I am testing 3.5.4-BETA rc0 and I get this error while running tests

ant -Dtestcase=QuorumKerberosHostBasedAuthTest test-core-java

 

{code}

Testsuite: org.apache.zookeeper.server.quorum.auth.QuorumKerberosHostBasedAuthTest
Tests run: 0, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1,029 sec
------------- Standard Output ---------------
2018-05-14 13:29:36,829 [myid:] - INFO  [main:JUnit4ZKTestRunner@47] - No test.method specified. using default methods.
2018-05-14 13:29:36,834 [myid:] - INFO  [main:JUnit4ZKTestRunner@47] - No test.method specified. using default methods.
2018-05-14 13:29:36,839 [myid:] - INFO  [main:MiniKdc@230] - Configuration:
2018-05-14 13:29:36,839 [myid:] - INFO  [main:MiniKdc@231] - ---------------------------------------------------------------
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   debug: false
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   transport: TCP
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   max.ticket.lifetime: 86400000
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   org.name: EXAMPLE
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   kdc.port: 0
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   org.domain: COM
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   max.renewable.lifetime: 604800000
2018-05-14 13:29:36,841 [myid:] - INFO  [main:MiniKdc@233] -   instance: DefaultKrbServer
2018-05-14 13:29:36,842 [myid:] - INFO  [main:MiniKdc@233] -   kdc.bind.address: localhost
2018-05-14 13:29:36,842 [myid:] - INFO  [main:MiniKdc@235] - ---------------------------------------------------------------
2018-05-14 13:29:37,855 [myid:] - INFO  [main:MiniKdc@356] - MiniKdc stopped.
------------- ---------------- ---------------

Testcase: org.apache.zookeeper.server.quorum.auth.QuorumKerberosHostBasedAuthTest took 0 sec
        Caused an ERROR
Unable to parse:includedir /etc/krb5.conf.d/
java.lang.RuntimeException: Unable to parse:includedir /etc/krb5.conf.d/
        at org.apache.kerby.kerberos.kerb.common.Krb5Parser.load(Krb5Parser.java:72)
        at org.apache.kerby.kerberos.kerb.common.Krb5Conf.addKrb5Config(Krb5Conf.java:47)
        at org.apache.kerby.kerberos.kerb.client.ClientUtil.getDefaultConfig(ClientUtil.java:94)
        at org.apache.kerby.kerberos.kerb.client.KrbClientBase.<init>(KrbClientBase.java:51)
        at org.apache.kerby.kerberos.kerb.client.KrbClient.<init>(KrbClient.java:38)
        at org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.<init>(SimpleKdcServer.java:54)
        at org.apache.zookeeper.server.quorum.auth.MiniKdc.start(MiniKdc.java:285)
        at org.apache.zookeeper.server.quorum.auth.KerberosSecurityTestcase.startMiniKdc(KerberosSecurityTestcase.java:70)
        at org.apache.zookeeper.server.quorum.auth.KerberosSecurityTestcase.setUpSasl(KerberosSecurityTestcase.java:56)

{code}
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 42 weeks, 1 day ago 0|i3to3z:
ZooKeeper ZOOKEEPER-3042

testFailedTxnAsPartOfQuorumLoss is flaky

Bug Closed Minor Fixed Bogdan Kanivets Bogdan Kanivets Bogdan Kanivets 13/May/18 04:32   20/May/19 13:50 10/Jul/18 06:18 3.5.3, 3.6.0, 3.4.12 3.5.5 tests   0 3 0 10800   According to the [dashboard|https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html] testFailedTxnAsPartOfQuorumLoss is flaky. I've looked at some logs and there are multiple causes of flakiness. One of them is in this line after step 5
{code:java}
Assert.assertEquals(1, outstanding.size());
{code}
For example [this|https://builds.apache.org/job/ZooKeeper_branch35_java10/10/artifact/build/test/logs] build of 3.5

I was able to reproduce this particular issue in debug mode and the problem is that 'outstading' map can also have 'closeSession' entries that are expected.

I'll submit a patch to relax this check.
100% 100% 10800 0 flaky, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 35 weeks, 2 days ago 0|i3tn9r:
ZooKeeper ZOOKEEPER-3041

Typo in error message, affects log analysis

Bug Closed Trivial Fixed Hugh O'Brien Hugh O'Brien Hugh O'Brien 13/May/18 01:47   17/Jul/18 00:49 16/May/18 13:36 3.5.3 3.6.0, 3.4.13, 3.5.5     0 3 0 600   simple typo

 

PR here: https://github.com/apache/zookeeper/pull/498/commits/a8cb7f668d31a7bcf12481409328a886231020f6
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 44 weeks, 1 day ago
Reviewed
0|i3tn8n:
ZooKeeper ZOOKEEPER-3040

flaky test EphemeralNodeDeletionTest

Bug Resolved Major Cannot Reproduce Norbert Kalmár Patrick D. Hunt Patrick D. Hunt 10/May/18 18:21   06/Aug/18 05:33 06/Aug/18 05:33 3.5.4, 3.6.0, 3.4.12   tests   0 4   Flakey test EphemeralNodeDeletionTest

{noformat}
java.lang.AssertionError: After session close ephemeral node must be deleted expected null, but was:<4294967302,4294967302,1525988536834,1525988536834,0,0,0,144127862257483776,1,0,4294967302
 {noformat}
flaky 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks, 3 days ago 0|i3tl8n:
ZooKeeper ZOOKEEPER-3039

TxnLogToolkit uses Scanner badly

Bug Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 09/May/18 19:12   17/Jul/18 00:49 15/May/18 12:58 3.5.4, 3.6.0, 3.4.13 3.5.4, 3.6.0, 3.4.13     0 3 0 600   If more than 1 CRC error is found in the Txn log file, TxnLogToolkit fails to get an answer for the second one, because it has already closed the Scanner which was probably closed the input stream also, so exception is thrown:
{noformat}
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 5:16:05 AM PDT session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ? y
CRC ERROR - 4/5/18 5:17:34 AM PDT session 0x26295bafcc90000 cxid 0x0 zxid 0x200000001 closeSession null
Would you like to fix it (Yes/No/Abort) ? Exception in thread "main" java.util.NoSuchElementException
at java.util.Scanner.throwFor(Scanner.java:862)
at java.util.Scanner.next(Scanner.java:1371)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.askForFix(TxnLogToolkit.java:208)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.dump(TxnLogToolkit.java:175)
at org.apache.zookeeper.server.persistence.TxnLogToolkit.main(TxnLogToolkit.java:101){noformat}
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks ago
Reviewed
0|i3tjjr:
ZooKeeper ZOOKEEPER-3038

Cleanup some nitpicks in TTL implementation

Bug Resolved Major Fixed Andor Molnar Andor Molnar Andor Molnar 09/May/18 18:39   21/Jan/19 09:54 10/May/18 00:14 3.5.3 3.5.4, 3.6.0 server   0 3   A few nitpicks which needs to be cleaned up:

1. Rename OldEphemeralType --> EphemeralTypeEmulate353
2. Remove unused method: getTTL()
3. Remove unused import from QuorumPeer

 
ttl_nodes 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks ago 0|i3tjin:
ZooKeeper ZOOKEEPER-3037

Add JvmPauseMonitor to ZooKeeper

Improvement Resolved Minor Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 09/May/18 10:57   18/Apr/19 16:18 18/Apr/19 13:18 3.5.3, 3.4.12 3.6.0 contrib   2 7 0 5400   After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage).

To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took.

The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off.

The class:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

Task:
- Create a class in ZK (under zookeeper/server/util/) called JvmPauseMonitor.
- Make feature configurable, by default: OFF
- Make sleep time and threshold time configurable
- Update documentation
- Add [current size of the heap OR % of heap used] in the log entry whenever sleep threshold had exceeded by a lot (10s)
100% 100% 5400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks ago 0|i3tiuf:
ZooKeeper ZOOKEEPER-3036

Unexpected exception in zookeeper

Bug Open Critical Unresolved Unassigned Oded Oded 09/May/18 06:48   14/Apr/19 12:57   3.4.10   quorum, server   0 8   3 Zookeepers, 5 kafka servers We got an issue with one of the zookeeprs (Leader), causing the entire kafka cluster to fail:

2018-05-09 02:29:01,730 [myid:3] - ERROR [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2018-05-09 02:29:01,730 [myid:3] - WARN  [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - ******* GOODBYE /192.168.0.91:42490 ********

 

We would expect that zookeeper will choose another Leader and the Kafka cluster will continue to work as expected, but that was not the case.

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
48 weeks, 4 days ago 0|i3tij3:
ZooKeeper ZOOKEEPER-3035

what does these opeartion code mean

Wish Resolved Minor Not A Problem Unassigned liyuzhou liyuzhou 08/May/18 12:19   09/May/18 02:35 08/May/18 12:38         0 2   I'm reading the source code, but I often can not understand the operation code mean in OpCode.java. For example , the sync operation code is 9, but I can't understand what does this mean, and the source code has nothing about the code description. Do we have some wiki or document abount operation code?
{code:java}
public interface OpCode {
public final int notification = 0;

public final int setACL = 7;

public final int getChildren = 8;

public final int sync = 9;

public final int ping = 11;
}
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks, 1 day ago 0|i3th6f:
ZooKeeper ZOOKEEPER-3034

Facing issues while building from source

Bug Closed Minor Fixed Balazs Meszaros Namrata Bhave Namrata Bhave 07/May/18 05:21   04/Oct/19 10:55 26/Feb/19 10:39 3.4.11 3.6.0, 3.5.5 build   0 4 0 15600   Linux x86_64, Ubuntu 18.04, Ubuntu 17.10. Building Zookeeper from source using below steps:

{{git clone git://github.com/apache/zookeeper}}
{{cd zookeeper}}
{{git checkout tags/release-3.4.11}}
{{ant compile}}
{{cd src/c}}
{{sudo apt-get install -y libcppunit-dev}}
{{ACLOCAL="aclocal -I /usr/share/aclocal" autoreconf -if}}
{{./configure && make && sudo make install}}
{{sudo make distclean}}

 

The 'autoreconf -if' step fails with below error:
+ ACLOCAL='aclocal -I /usr/share/aclocal'
+ autoreconf -if
configure.ac:37: warning: macro 'AM_PATH_CPPUNIT' not found in library
libtoolize: putting auxiliary files in '.'.
libtoolize: copying file './ltmain.sh'
libtoolize: Consider adding 'AC_CONFIG_MACRO_DIRS([m4])' to configure.ac,
libtoolize: and rerunning libtoolize and aclocal.
libtoolize: Consider adding '-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
configure.ac:37: warning: macro 'AM_PATH_CPPUNIT' not found in library
configure.ac:37: error: possibly undefined macro: AM_PATH_CPPUNIT
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
autoreconf: /usr/bin/autoconf failed with exit status: 1
Build step 'Execute shell' marked build as failure
 

This is happening on Ubuntu 18.04. Can someone please help in resolving this error?
100% 100% 15600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
27 weeks ago 0|i3teq7:
ZooKeeper ZOOKEEPER-3033

ZOOKEEPER-3021 Step 1.2 - Create zk-recipes maven structure

Sub-task Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 03/May/18 07:51   02/Apr/19 06:40 17/Aug/18 05:57 3.5.4, 3.6.0, 3.4.12 3.6.0, 3.5.5, 3.4.14 build, scripts   0 2 0 8400   Create a project structure that separates the different parts of ZooKeeper into a more meaningful packages for the future maven build.

This should be done in iterations to limit the impact.

* First iteration - safe changes including moving src/docs to zk-docs, creating zk-it empty directory. Build and conf directory remains unchanged. These changes also have minimum impact on PR’s.

* *Second iteration* - move src/recipes to zk-recipes.

* Third iteration - move src/contrib to zk-contrib.
* Fourth iteration - move src/c to zk-client (java will be moved in Phase 2)
* Fifth iteration - move jute under src directory
* Sixth iteration - move src/java/main to zk-server, which will be further separated in Step 2.

{noformat}
zookeeper
| -bin
| -conf
| -zookeeper-docs
| -zookeeper-it (integration tests)
| -zookeeper-recipes
| | -zookeeper-recipes-election
| | -zookeeper-recipes-lock
\ \ -zookeeper-recipes-queue
{noformat}
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 30 weeks, 6 days ago 0|i3tauv:
ZooKeeper ZOOKEEPER-3032

ZOOKEEPER-3021 Step 1.6 - Create zk-server maven structure

Sub-task Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 03/May/18 07:49   02/Apr/19 06:40 06/Nov/18 10:42 3.6.0 3.6.0, 3.5.5, 3.4.14 build, scripts   0 2 0 58200   Create a project structure that separates the different parts of ZooKeeper into a more meaningful packages for the future maven build.

This should be done in iterations to limit the impact.

* First iteration - safe changes including moving src/docs to zk-docs, creating zk-it empty directory. Build and conf directory remains unchanged. These changes also have minimum impact on PR’s.
* Second iteration - move src/recipes to zk-recipes.
* Third iteration - move src/contrib to zk-contrib.
* Fourth iteration - move src/c to zk-client (java will be moved in Phase 2)
* Fifth iteration - move jute under src directory

* *Sixth iteration* - move src/java/main to zk-server, also separate client code from server code, move common files to zookeeper-common.
*
*Modification*
It is not feasible to separate core java files into server, client and common. It will remain in zookeeper-server.

{noformat}
zookeeper
| -bin
| -conf
| -jute
| -zookeeper-client
| | -zookeeper-client-c
| | - *REMOVED* zookeeper-client-java
| - *REMOVED* zookeeper-common
| -zookeeper-contrib
| | -zookeeper-contrib-fatjar
| | -zookeeper-contrib-huebrowser
| | -zookeeper-contrib-loggraph
| | -zookeeper-contrib-monitoring
| | -zookeeper-contrib-rest
| | -zookeeper-contrib-zkfuse
| | -zookeeper-contrib-zkperl
| | -zookeeper-contrib-zkpython
| | -zookeeper-contrib-zktreeutil
| \ -zookeeper-contrib-zooinspector
| -zookeeper-docs
| -zookeeper-it (integration tests)
| -zookeeper-server
| -zookeeper-recipes
| | -zookeeper-recipes-election
| | -zookeeper-recipes-lock
\ \ -zookeeper-recipes-queue

{noformat}
100% 100% 58200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks, 2 days ago 0|i3tauf:
ZooKeeper ZOOKEEPER-3031

ZOOKEEPER-3021 Step 1.4 - Create zk-client maven structure

Sub-task Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 03/May/18 07:47   02/Apr/19 06:40 23/Aug/18 10:12 3.6.0 3.6.0, 3.5.5, 3.4.14 build, scripts   0 2 0 6000   Create a project structure that separates the different parts of ZooKeeper into a more meaningful packages for the future maven build.

This should be done in iterations to limit the impact.

* First iteration - safe changes including moving src/docs to zk-docs, creating zk-it empty directory. Build and conf directory remains unchanged. These changes also have minimum impact on PR’s.
* Second iteration - move src/recipes to zk-recipes.
* Third iteration - move src/contrib to zk-contrib.

* *Fourth iteration* - move src/c to zk-client (java will be moved in Phase 2)

* Fifth iteration - move jute under src directory
* Sixth iteration - move src/java/main to zk-server, which will be further separated in Step 2.

{noformat}
zookeeper
| -bin
| -conf
| -zookeeper-client
| | -zookeeper-client-c
| -zookeeper-contrib
| | -zookeeper-contrib-fatjar
| | -zookeeper-contrib-huebrowser
| | -zookeeper-contrib-loggraph
| | -zookeeper-contrib-monitoring
| | -zookeeper-contrib-rest
| | -zookeeper-contrib-zkfuse
| | -zookeeper-contrib-zkperl
| | -zookeeper-contrib-zkpython
| | -zookeeper-contrib-zktreeutil
| \ -zookeeper-contrib-zooinspector
| -zookeeper-docs
| -zookeeper-it (integration tests)
| -zookeeper-recipes
| | -zookeeper-recipes-election
| | -zookeeper-recipes-lock
\ \ -zookeeper-recipes-queue

{noformat}
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 30 weeks ago 0|i3tatz:
ZooKeeper ZOOKEEPER-3030

ZOOKEEPER-3021 Step 1.3 - Create zk-contrib maven structure

Sub-task Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 03/May/18 07:43   02/Apr/19 06:40 07/Aug/18 05:43 3.6.0 3.6.0, 3.5.5, 3.4.14 build, scripts   0 2 0 7800   Create a project structure that separates the different parts of ZooKeeper into a more meaningful packages for the future maven build.

This should be done in iterations to limit the impact.

* First iteration - safe changes including moving src/docs to zk-docs, creating zk-it empty directory. Build and conf directory remains unchanged. These changes also have minimum impact on PR’s.
* Second iteration - move src/recipes to zk-recipes.

* *Third iteration* - move src/contrib to zk-contrib.

* Fourth iteration - move src/c to zk-client (java will be moved in Phase 2)
* Fifth iteration - move jute under src directory
* Sixth iteration - move src/java/main to zk-server, which will be further separated in Step 2.

{noformat}
zookeeper
| -bin
| -conf
| -zookeeper-contrib
| | -zookeeper-contrib-fatjar
| | -zookeeper-contrib-huebrowser
| | -zookeeper-contrib-loggraph
| | -zookeeper-contrib-monitoring
| | -zookeeper-contrib-rest
| | -zookeeper-contrib-zkfuse
| | -zookeeper-contrib-zkperl
| | -zookeeper-contrib-zkpython
| | -zookeeper-contrib-zktreeutil
| \ -zookeeper-contrib-zooinspector
| -zookeeper-docs
| -zookeeper-it (integration tests)
| -zookeeper-recipes
| | -zookeeper-recipes-election
| | -zookeeper-recipes-lock
\ \ -zookeeper-recipes-queue

{noformat}
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks, 2 days ago 0|i3tatj:
ZooKeeper ZOOKEEPER-3029

ZOOKEEPER-3021 Create pom files for jute, server and client

Sub-task Closed Blocker Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 03/May/18 06:28   24/Sep/19 02:45 08/Jan/19 04:27 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 build, scripts   0 3 0 7800   After the directory structures has been created, it is time to create the pom files for all the modules, and create the build hierarchy.
At first, ant should remain in place until we are sure maven works fine.

jute and server should be priority first. docs is handled in a different jira, as it is also being migrated. Recipes and contrib will remain for last.

The different modules will get their maven structure:
{noformat}
zookeeper-[something]
| -src
| | -main
| | | -java
| | | \org...
| | \resources
| | -test (unit tests only)
| | | -java
| | | \org...
| | \ resources
| | - it (integration tests)
| \pom.xml
{noformat}
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
25 weeks, 2 days ago 0|i3taq7:
ZooKeeper ZOOKEEPER-3028

ZOOKEEPER-3021 Create assembly in pom.xml

Sub-task Closed Blocker Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 03/May/18 06:20   20/May/19 13:51 21/Feb/19 09:55 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.15 build, scripts   0 2 0 38400   After building the modules, it should be still packaged in a single tar. An assembly plugin would be a relatively easy way to do this. 100% 100% 38400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 4 weeks ago 0|i3tapz:
ZooKeeper ZOOKEEPER-3027

Accidently removed public API of FileTxnLog.setPreallocSize()

Bug Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 27/Apr/18 10:31   17/Jul/18 00:50 27/Apr/18 14:35 3.5.4, 3.6.0, 3.4.13 3.5.4, 3.6.0, 3.4.13 server   0 4   In my latest commit regarding TxnLogToolkit there's a refactor to outsource file padding logic from FileTxnLog to a separate class:

[https://github.com/apache/zookeeper/commit/126fb0f22d701cad58bf3123bf7d8f2219e60387#diff-89717124564925d61d29dd817bcdd915]

Unfortunately public static method setPreallocSize(int) has also been moved to the new class, but it's being actively used by hadoop-common project too:

[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java#L384]

I'd like to submit a patch to revert the deleted method which is going to call the new one, but will keep backward compatibility with Hadoop.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 46 weeks, 6 days ago 0|i3t3xb:
ZooKeeper ZOOKEEPER-3026

ReadOnlyModeTest is using Thread deprecated API.

Bug Open Major Unresolved Andor Molnar Patrick D. Hunt Patrick D. Hunt 24/Apr/18 19:38   22/Jun/18 00:49   3.5.4, 3.6.0, 3.4.12       0 1   Same issue as ZOOKEEPER-2415

 

Suspend and resume are being called on peers (which are subclasses of Thread):
{quote}// if we don't suspend a peer it will rejoin a quorum
qu.getPeer(1).peer.suspend();

....

// resume poor fellow
qu.getPeer(1).peer.resume();
{quote}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks, 1 day ago 0|i3szgf:
ZooKeeper ZOOKEEPER-3025

cmake windows build is broken on jenkins

Bug Resolved Blocker Fixed Andrew Schwartzmeyer Patrick D. Hunt Patrick D. Hunt 23/Apr/18 17:07   24/Apr/18 00:34 23/Apr/18 20:28 3.5.4, 3.6.0 3.5.4, 3.6.0 build   0 4   Jenkins build for windows cmake is failing:

started here:

[https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-windows-cmake/2717/console]
{noformat}
f:\jenkins\jenkins-slave\workspace\zookeeper-trunk-windows-cmake\src\c\src\hashtable\hashtable.h(6): fatal error C1083: Cannot open include file: 'winconfig.h': No such file or directory [F:\jenkins\jenkins-slave\workspace\ZooKeeper-trunk-windows-cmake\src\c\hashtable.vcxproj]
hashtable.c{noformat}
 

Looks like one or the other or both of these commits are at issue (jenkins build broken on these two changes being committed)
h2. [#2717 (Apr 16, 2018 4:58:17 AM)|https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-windows-cmake/2717/changes]
# ZOOKEEPER-3017: Link libm in CMake on FreeBSD. — [hanm|https://builds.apache.org/user/hanm/] / [detail|https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-windows-cmake/2717/changes#67378512285c4b8dc9be50b90bbd2967068fc24e]
# ZOOKEEPER-2999: CMake build should use target-level commands — [hanm|https://builds.apache.org/user/hanm/] / [detail|https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-windows-cmake/2717/changes#9ba4aeb4f92c1fc3167ff8e2b56e02f3e344d3ba]

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks, 2 days ago
Reviewed
0|i3sx7j:
ZooKeeper ZOOKEEPER-3024

C++ Client return sub paths in String_vector illegal after zoo_get_children completed with ZOK

Bug Open Major Unresolved Unassigned yijie yijie 23/Apr/18 02:05   23/Apr/18 02:13           0 1   we use  c++ client api:

int zoo_get_children(zhandle_t *zh, const char *path, int watch, struct String_vector *strings)

to list zookeeper dir, zoo_get_children return zok。

then we visit strings, its not right

!image-2018-04-23-14-05-03-534.png!
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 47 weeks, 3 days ago 0|i3svpr:
ZooKeeper ZOOKEEPER-3023

ZOOKEEPER-3170 Flaky test: org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalFollowerRunWithDiff

Sub-task Open Major Unresolved Unassigned Pravin Dsilva Pravin Dsilva 20/Apr/18 08:59   28/Aug/19 22:22   3.6.0       0 5   Getting the following error on master branch:

Error Message
{code:java}
expected:<4294967298> but was:<0>{code}
Stacktrace
{code:java}
junit.framework.AssertionFailedError: expected:<4294967298> but was:<0> at org.apache.zookeeper.server.quorum.Zab1_0Test$5.converseWithFollower(Zab1_0Test.java:876) at org.apache.zookeeper.server.quorum.Zab1_0Test.testFollowerConversation(Zab1_0Test.java:523) at org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalFollowerRunWithDiff(Zab1_0Test.java:791) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79){code}
Flaky test:https://builds.apache.org/job/ZooKeeper-trunk-java10/141/testReport/junit/org.apache.zookeeper.server.quorum/Zab1_0Test/testNormalFollowerRunWithDiff/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks ago 0|i3stif:
ZooKeeper ZOOKEEPER-3022

ZOOKEEPER-3021 Step 1.1 - Create docs and it maven structure

Sub-task Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 20/Apr/18 07:26   02/Apr/19 06:40 04/Jul/18 07:03 3.6.0 3.6.0, 3.5.5, 3.4.14 build, scripts   0 3 0 6000   Create a project structure that separates the different parts of ZooKeeper into a more meaningful packages for the future maven build.

This should be done in iterations to limit the impact.

* *First iteration* - safe changes including moving src/docs to zk-docs, creating zk-it empty directory. Build and conf directory remains unchanged. These changes also have minimum impact on PR’s.

* Second iteration - move src/recipes to zk-recipes.
* Third iteration - move src/contrib to zk-contrib.
* Fourth iteration - move src/c to zk-client (java will be moved in Phase 2)
* Fifth iteration - move jute under src directory
* Sixth iteration - move src/java/main to zk-server, which will be further separated in Step 2.

{noformat}
zookeeper
| -bin
| -conf
| -zookeeper-docs
| -zookeeper-it (integration tests)
{noformat}
100% 100% 6000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 37 weeks, 1 day ago 0|i3stf3:
ZooKeeper ZOOKEEPER-3021

Umbrella: Migrate project structure to Maven build

Improvement Closed Blocker Done Norbert Kalmár Norbert Kalmár Norbert Kalmár 20/Apr/18 07:07   19/Dec/19 18:01 27/Feb/19 05:12 3.5.4, 3.6.0, 3.4.13 3.5.5, 3.4.14 build, build-infrastructure, scripts   0 5   ZOOKEEPER-3022, ZOOKEEPER-3028, ZOOKEEPER-3029, ZOOKEEPER-3030, ZOOKEEPER-3031, ZOOKEEPER-3032, ZOOKEEPER-3033, ZOOKEEPER-3080, ZOOKEEPER-3122, ZOOKEEPER-3171, ZOOKEEPER-3223, ZOOKEEPER-3224, ZOOKEEPER-3225, ZOOKEEPER-3226, ZOOKEEPER-3256, ZOOKEEPER-3275, ZOOKEEPER-3285 In multiple steps, Maven should replace current ant build in ZooKeeper.

First phase - separate project structure that requires no code change:
{noformat}
zookeeper
|-bin
|-conf
|-zk-client
| |-zk-client-c
|-zk-contrib
| |-zk-contrib-fatjar
| |-zk-contrib-huebrowser
| |-zk-contrib-loggraph
| |-zk-contrib-monitoring
| |-zk-contrib-rest
| |-zk-contrib-zkfuse
| |-zk-contrib-zkperl
| |-zk-contrib-zkpython
| |-zk-contrib-zktreeutil
| \-zk-contrib-zooinspector
|-zk-docs
|-zk-it (integration tests)
|-zk-server
|-zk-recipes
| |-zk-recipes-election
| |-zk-recipes-lock
\ \-zk-recipes-queue
{noformat}
 
 
Second phase - separate modules that require code changes:
{noformat}
zookeeper
|-bin
|-conf
*|-jute*
|-zk-client
| |-zk-client-c
*| |-zk-client-java* (separated from zk-server)
*| \-zk-client-go* (or any other language)
*|-zk-common*
|-zk-contrib
| |-zk-contrib-fatjar
| |-zk-contrib-huebrowser
| |-zk-contrib-loggraph
| |-zk-contrib-monitoring
| |-zk-contrib-rest
| |-zk-contrib-zkfuse
| |-zk-contrib-zkperl
| |-zk-contrib-zkpython
| |-zk-contrib-zktreeutil
| \-zk-contrib-zooinspector
|-zk-docs
|-zk-it (integration tests)
|-zk-server
|-zk-recipes
| |-zk-recipes-election
| |-zk-recipes-lock
\ \-zk-recipes-queue
{noformat}

 
Every module will have the same maven structure:
{noformat}
zk-something
|-src
| |-main
| | |-java
| | | \org...
| | \resources
| \test (unit tests only?)
| |-java
| | \org...
| \resources
\pom.xml (build.xml, build.gradle?)
{noformat}

There is already ZOOKEEPER-1078, but it's main approach is to create a maven proxy on top of ant.
The main idea here is to replace ant with "pure" maven, and update the project structure accordingly.

It is also worth noting, that backporting only the package changes to 3.4 is a good practice for future backport commits. Maven build implementation not needed, just the directory structuro to be compatible with 3.5/master.
100% 243000 0 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks, 6 days ago 0|i3mg98:
ZooKeeper ZOOKEEPER-3020

Review of SyncRequestProcessor

Improvement Resolved Minor Fixed David Mollitor David Mollitor David Mollitor 18/Apr/18 19:43   30/Apr/19 21:27 30/Apr/19 12:08   3.6.0     0 3 0 14400   # Use {{ArrayDeque}} instead of {{LinkedList}}
# Use {{ThreadLocalRandom}} instead of {{Random}}
# Remove the 'running' flag - use the {{Thread#join}} facility to detect if the thread has stopped running. Using a flag can cause race condition issues and is superfluous.
# Make static final variable names in all caps
# General cleanup


{quote}
This class is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue.
{quote}

https://docs.oracle.com/javase/7/docs/api/java/util/ArrayDeque.html
100% 100% 14400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Patch
46 weeks, 1 day ago 0|i3sqk7:
ZooKeeper ZOOKEEPER-3019

Add a metric to track number of slow fsyncs

Improvement Closed Major Fixed Norbert Kalmár Norbert Kalmár Norbert Kalmár 06/Apr/18 07:26   17/Jul/18 00:49 08/Jun/18 10:54 3.5.3, 3.4.11, 3.6.0 3.6.0, 3.4.13, 3.5.5 jmx, server   0 4 0 3600   Add jmx bean and Command to ZooKeeper server to expose the the number of slow fsyncs as a metric.

FileTxnLog.commit() should count the number of times fsync exceeds fsyncWarningThresholdMS.
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 40 weeks, 6 days ago 0|i3s8db:
ZooKeeper ZOOKEEPER-3018

Ephemeral node not deleted after session is gone

Bug Open Major Unresolved Unassigned Daniel C Daniel C 04/Apr/18 16:09   04/Oct/19 10:55   3.4.6   server   0 5   Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 GNU/Linux We have a live Zookeeper environment (quorum size is 2) and observed a strange behavior:
* Kafka created 2 ephemeral nodes /brokers/ids/822712429 and /brokers/ids/707577499 on 2018-03-12 03:30:36.933
* The Kafka clients were long gone but as of today (20+ days after), the two ephemeral nodes are still present

 

Troubleshooting:

1) Lists the outstanding sessions and ephemeral nodes

 
{noformat}
$ echo dump | nc $SERVER1 2181
SessionTracker dump:
org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
ephemeral nodes dump:
Sessions with Ephemerals (2):
0x162183ea9f70003:
               /brokers/ids/822712429
0x162183ea9f70002:
               /brokers/ids/707577499
               /controller
{noformat}
 

 

2) stat on /brokers/ids/822712429

 
{noformat}
zk> stat /brokers/ids/822712429
czxid: 4294967344
mzxid: 4294967344
pzxid: 4294967344
ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
version: 0
cversion: 0
aversion: 0
owner: 99668799174148099
datalen: 102
children: 0
{noformat}
 

 

3) List full connection/session details for all clients connected

 
{noformat}
$ echo cons | nc $SERVER1 2181
 /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
 /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
{noformat}
 

 

 
{noformat}
$ echo cons | nc $SERVER2 2181
 /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
 /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
{noformat}
 

 

4) health

 
{noformat}
$ echo mntr | nc $SERVER1 2181
zk_version           3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency  0
zk_max_latency 443
zk_min_latency  0
zk_packets_received       11158019
zk_packets_sent               11158244
zk_num_alive_connections           2
zk_outstanding_requests              0
zk_server_state follower
zk_znode_count               344
zk_watch_count               0
zk_ephemerals_count     3
zk_approximate_data_size          36654
zk_open_file_descriptor_count   33
zk_max_file_descriptor_count     65536
{noformat}
 

 

5) Server logs with related sessions:
{noformat}
Only found these logs from Server1 related to the sessions (0x162183ea9f70002 and 0x162183ea9f70003):

2018-03-12 03:28:35,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.196.18.60:26775

2018-03-12 03:28:35,131 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.196.18.60:26775; will be dropped if server is in r-o mode

2018-03-12 03:28:35,131 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.196.18.60:26775

2018-03-12 03:28:35,137 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775

 
2018-03-12 03:30:36,415 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.247.114.70:39260

2018-03-12 03:30:36,422 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.247.114.70:39260; will be dropped if server is in r-o mode

2018-03-12 03:30:36,423 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.247.114.70:39260

2018-03-12 03:30:36,428 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260

 
2018-03-31 01:29:58,865 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003{noformat}
6) Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429:
{noformat}
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1

3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1

3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2{noformat}
 

7) Additional questions from [~andorm]
{noformat}
1) Why is the session closed, the client closed it or the cluster expired it?

[Daniel Chan] in this case, the client got killed and we expect the session would be expired by the cluster

 
2) which server was the session attached to - the first (44sec max

lat) or one of the others? Which server was the leader?

[Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max latency) while Server2 is the leader

 
3) the znode exists on all 4 servers, is that right?

[Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both servers

 {noformat}
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
49 weeks ago 0|i3s5of:
ZooKeeper ZOOKEEPER-3017

Link libm in CMake on FreeBSD

Task Resolved Minor Fixed David Forsythe David Forsythe David Forsythe 04/Apr/18 00:08   16/Apr/18 17:09 16/Apr/18 00:26   3.5.4, 3.6.0     0 2   Libm needs to be linked on FreeBSD. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 48 weeks, 3 days ago 0|i3s46v:
ZooKeeper ZOOKEEPER-3016

Follower QuorumCnxManager$Listener thread died due to incorrect client packet

Bug Resolved Major Fixed Unassigned sumit agrawal sumit agrawal 03/Apr/18 01:20   09/Apr/18 06:12 09/Apr/18 06:12 3.4.6 3.4.7     0 3   While accepting connection from client, and message is incorrect, this causes NegativeArraySizeException while creating byte array of negative size.

 

~2018-03-02 23:51:21 [UTC:20180302T235121+0100]|INFO ||/xx.xx.xx.xx:3888hread|Coordination > Received connection request /yy.yy.yy.yy:18320 (QuorumCnxManager.java:511)~

~2018-03-02 23:51:21 [UTC:20180302T235121+0100]|ERROR||/xx.xx.xx.xx:3888hread|Coordination > Thread Thread[/xx.xx.xx.xx:3888,5,main] died (NIOServerCnxnFactory.java:44)~
~java.lang.NegativeArraySizeException~
~at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:242)~
~at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:513)~

 

Below is code reference having the issue.

int num_remaining_bytes = din.readInt();
byte[] b = new byte[num_remaining_bytes];

 

This makes other node in quorum unable to connect to this node. Here client is security scan app.

 

Check for invalid input must be present to avoid Node crashing and security.

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 50 weeks, 2 days ago 0|i3s2t3:
ZooKeeper ZOOKEEPER-3015

Publish the value of getIdleRecv() in WatchedEvent of Disconnected

Improvement Open Minor Unresolved Unassigned Antonio Rafael Rodrigues Antonio Rafael Rodrigues 31/Mar/18 13:02   01/Apr/18 11:50   3.5.3   java client   0 2   In the class ClientCnxn, at the line 1247:
eventThread.queueEvent(new WatchedEvent(
                                    Event.EventType.None,
                                    Event.KeeperState.Disconnected,
                                    null));

The current value of getIdleRecv() could be published inside the WatchedEvent, so that the clients that are listening to this event could know exactly how many time has been elapsed.

This would be specially useful in the case of the message "Client session timed out, have not heard from server in " . When the client receive a WatchedEvent with Event.KeeperState.Disconnected, it doesn't know if it was due to a immediate loss of connection or a lack of heart beats. Publishing the value of getIdleRecv() would give a clue on that.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 50 weeks, 4 days ago 0|i3s0nz:
ZooKeeper ZOOKEEPER-3014

watch can be added to non-existed path by exist command

Bug Resolved Major Not A Problem Unassigned CHQ CHQ 28/Mar/18 04:48   04/Oct/19 10:55 29/Mar/18 12:09 3.4.5, 3.4.6 3.4.12 server   0 2   We have client A which create a znode ,and its path is /zk/lock/100000.  Another client B thread is acquiring for the lock, so it calls the exist command with watch periodically to check if it is available. Then Client A has finished this work, and  delete this znode. Client b still calls exist command with watch. Because the code doesn't check node existence, when the  Watch add operation comes , it will add to non-exist node path.

This problem may be cause by the follow code. 
{code:java}
public Stat statNode(String path, Watcher watcher)
throws KeeperException.NoNodeException {
Stat stat = new Stat();
DataNode n = nodes.get(path);
if (watcher != null) {
dataWatches.addWatch(path, watcher);
}
if (n == null) {
throw new KeeperException.NoNodeException();
}
synchronized (n) {
n.copyStat(stat);
return stat;
}
}
{code}
The zk version we use is 3.4.5. We meet a problem that is the zk client try to reestablish to zk cluster failed after disconnect for some reason.We find it causes by ZOOKEEPER-706. But we try to know why there are so many watches. Then we find this problem.

 

 

 
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
1 year, 51 weeks ago 0|i3rvqv:
ZooKeeper ZOOKEEPER-3013

Fix Typo in doc

Bug Resolved Trivial Invalid Unassigned Sanjay Pillai Sanjay Pillai 27/Mar/18 17:21   27/Mar/18 17:30 27/Mar/18 17:30     documentation   0 1   Fix a minor typo in zookeeperProgrammers.html doc 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 51 weeks, 2 days ago 0|i3rv7z:
ZooKeeper ZOOKEEPER-3012

Fix unit test: testDataDirAndDataLogDir should not use hardcode test folders

Improvement Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 27/Mar/18 11:19   17/Jul/18 00:49 09/May/18 18:21 3.5.3, 3.4.11 3.5.4, 3.6.0, 3.4.13 server, tests   0 2   The following arrange methods uses hard coded values:
{noformat}
when(configMock.getDataDir()).thenReturn("/tmp/zookeeper");
when(configMock.getDataLogDir()).thenReturn("/tmp/zookeeperLog");
{noformat}
Which makes the test fail if the folders exist on the running machine.

Random test folders should be created and removed during cleanup.
unit-test 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks, 1 day ago
Reviewed
0|i3ru33:
ZooKeeper ZOOKEEPER-3011

Some NPEs, maybe

Bug Open Major Unresolved Unassigned lujie lujie 26/Mar/18 04:18   26/Mar/18 05:29   3.6.0       0 2   Inspired by ZK-3006 , I develop a simple static analysis tool to find other Potential NPE like ZK-3006.Due to that  i am a newbie here, some of them i am not sure whether they will truly cause NPE, anyway I still list them in here(format:caller,callee):
# StaticHostProvider#updateServerList,StaticHostProvider#getServerAtCurrentIndex
# DataTree#getACL,ReferenceCountedACLCache#convertLong
# ConnectionBean#toString,ConnectionBean#getSourceIP
# Leader#propose,SerializeUtils#serializeRequest

Hopefully someone can confirm them and help improve this tool
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 51 weeks, 3 days ago 0|i3rrs7:
ZooKeeper ZOOKEEPER-3010

Potential NPE in Observer#observeLeader and Follower#followLeader

Bug Open Major Unresolved Unassigned lujie lujie 26/Mar/18 04:09   22/Apr/18 03:23   3.6.0       0 3    

 
Inspired by ZK-3006 , I develop a simple static analysis tool to find other Potential NPE like ZK-3006. This bug is found by this tool ,and I have carefully studied it.  But i am a newbie at here so i may be wrong, hope someone could confirm it and help me improve this tool.
h2. Bug description:

callee Learner#findLeader will return null and callee developer check it but just log:
{code:java}
// code placeholder
if (leaderServer == null) {
LOG.warn("Couldn't find the leader with id = " + current.getId());
}
return leaderServer;
{code}
caller  Observer#observeLeader and Follower#followLeader will directly use return value w/o null check:
{code:java}
//Follower#followLeader
QuorumServer leaderServer = findLeader();
try {
connectToLeader(leaderServer.addr, leaderServer.hostname);
..........
}
//Observer#observeLeader
QuorumServer leaderServer = findLeader();
LOG.info("Observing " + leaderServer.addr);
try {
connectToLeader(leaderServer.addr, leaderServer.hostname);
}{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks, 4 days ago 0|i3rrrz:
ZooKeeper ZOOKEEPER-3009

Potential NPE in NIOServerCnxnFactory

Bug Closed Major Fixed lujie lujie lujie 26/Mar/18 03:46   04/Oct/19 10:55 15/Jun/18 02:03 3.6.0, 3.4.12 3.6.0, 3.5.5, 3.4.14     0 5 0 7200   Inspired by ZK-3006 , I develop a simple static analysis tool to find other Potential NPE like ZK-3006. This bug is found by this tool ,and I have carefully studied it.  But i am a newbie at here so i may be wrong, hope someone could confirm it and help me improve this tool.
h2. Bug description:

 class NIOServerCnxn has three method :getSocketAddress,getRemoteSocketAddress can return null just like :
{code:java}
// code placeholder
if (sock.isOpen() == false) {
return null;
}
{code}
some of their caller give null checker, some(total 3 list in below) are not. 
{code:java}
// ServerCnxn#getConnectionInfo
Map<String, Object> info = new LinkedHashMap<String, Object>();
info.put("remote_socket_address", getRemoteSocketAddress());// Map.put will throw NPE if parameter is null

//IPAuthenticationProvider#handleAuthentication
tring id = cnxn.getRemoteSocketAddress().getAddress().getHostAddress();
cnxn.addAuthInfo(new Id(getScheme(), id));// finally call Set.add(it will throw NPE if parameter is null )

//NIOServerCnxnFactory#addCnxn
InetAddress addr = cnxn.getSocketAddress();
Set<NIOServerCnxn> set = ipMap.get(addr);// Map.get will throw NPE if parameter is null{code}
I think we should add null check in above three caller .

 
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 34 weeks, 4 days ago 0|i3rrq7:
ZooKeeper ZOOKEEPER-3008

Potential NPE in SaslQuorumAuthLearner#authenticate and SaslQuorumAuthServer#authenticate

Bug Open Major Unresolved Unassigned lujie lujie 26/Mar/18 03:42   11/Aug/18 23:40   3.6.0       0 3 0 3600   Inspired by ZK-3006 , I develop a simple static analysis tool to find other Potential NPE like ZK-3006. This bug is found by this tool ,and I have carefully studied it.  But i am a newbie at here so i may be wrong, hope someone could confirm it and help me improve this tool.
h2. Bug description:

callee :SecurityUtils#createSaslClient will return null while encounter exception
{code:java}
// code placeholder
catch (Exception e) {
LOG.error("Exception while trying to create SASL client", e);
return null;
}
{code}
but its caller has no null check just like:
{code:java}
// code placeholder
sc = SecurityUtils.createSaslClient();
if (sc.hasInitialResponse()) {
responseToken = createSaslToken(new byte[0], sc, learnerLogin);
}
{code}
I think we should add null check in caller while callee return null
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 31 weeks, 4 days ago 0|i3rrpr:
ZooKeeper ZOOKEEPER-3007

Potential NPE in ReferenceCountedACLCache#deserialize

Bug Closed Major Fixed lujie lujie lujie 26/Mar/18 03:30   17/Jul/18 00:50 26/Apr/18 18:24 3.6.0 3.5.4, 3.6.0, 3.4.13     0 5   Inspired by ZK-3006 , I develop a simple static analysis tool to find other Potential NPE like ZK-3006. This bug is found by this tool ,and I have carefully studied it.  But i am a newbie at here so i may be wrong, hope someone could confirm it and help me improve this tool.
h3. Bug describtion:

callee BinaryInputArchive#startVector will return null:
{code:java}
// code placeholder
public Index startVector(String tag) throws IOException {
int len = readInt(tag);
if (len == -1) {
return null;
}
{code}
and caller ReferenceCountedACLCache#deserialize  call it without null check
{code:java}
// code placeholder
Index j = ia.startVector("acls");
while (!j.done()) {
ACL acl = new ACL();
acl.deserialize(ia, "acl");
}{code}
but all the other 14 caller of BinaryInputArchive#startVector performs null checker like:
{code:java}
// code placeholder
Index vidx1 = a_.startVector("acl");
if (vidx1!= null)
for (; !vidx1.done(); vidx1.incr()){
.....
}
}
}
{code}
so i think we also need add null check in caller ReferenceCountedACLCache#deserialize  just like other 14 caller

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks ago 0|i3rron:
ZooKeeper ZOOKEEPER-3006

Potential NPE in ZKDatabase#calculateTxnLogSizeLimit

Bug Resolved Major Fixed Edward Ribeiro lujie lujie 24/Mar/18 04:16   06/Apr/18 00:31 06/Apr/18 00:04 3.6.0 3.5.4, 3.6.0     0 6   I have found a potential NPE in ZKDatabase#calculateTxnLogSizeLimit:

 
{code:java}
//ZKDatabase
public long calculateTxnLogSizeLimit() {
long snapSize = 0;
try {
snapSize = snapLog.findMostRecentSnapshot().length();
} catch (IOException e) {
LOG.error("Unable to get size of most recent snapshot");
}
return (long) (snapSize * snapshotSizeFactor);
}
{code}
 in FileTxnSnapLog#findMostRecentSnapshot(), it will return the result of  FileSnap#findMostRecentSnapshot:
{code:java}
// called by FileTxnSnapLog#findMostRecentSnapshot()
public File findMostRecentSnapshot() throws IOException {
List<File> files = findNValidSnapshots(1);
if (files.size() == 0) {
return null;
}
return files.get(0);
}
{code}
So it will return null when the files sizes is 0, but ZKDatabase#calculateTxnLogSizeLimit has no null checker

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 49 weeks, 6 days ago 0|i3rqlj:
ZooKeeper ZOOKEEPER-3005

Update zkEnv.cmd to check if environment variable are already exist

Improvement Open Minor Unresolved Unassigned Mike J Mike J 22/Mar/18 19:23   22/Mar/18 19:23       other   0 2   Update the zkEnv.cmd script to not override ZOOCFGDIR, ZOO_LOG_DIR, or ZOO_LOG4J_PROP if they have already been set. This would match the functionality that currently exists in zkEnv.sh.

Also, add the ability to set the config file name using the ZOOCFG environment variable. This would match functionality that currently exists in zkEnv.sh.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years ago 0|i3rofr:
ZooKeeper ZOOKEEPER-3004

create jenkins jobs to test java 6 for branch 3.4

Bug Open Major Unresolved Unassigned Abraham Fine Abraham Fine 22/Mar/18 15:15   22/Mar/18 15:16   3.4.11       0 1   3.4 currently supports Java 6. While working on the release of 3.4.12 I noticed a minor issue while using java 6 to build zookeeper (see the linked issue). We should have a jenkins job that continuously tests 3.4 and pull requests targeting 3.4 against this older jdk. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years ago 0|i3ro0n:
ZooKeeper ZOOKEEPER-3003

ant package does not fail if javadoc generation fails

Bug Open Major Unresolved Unassigned Abraham Fine Abraham Fine 22/Mar/18 15:12   22/Mar/18 15:16   3.4.11       0 1   While working on the release of 3.4.12 and testing under jdk 6 I noticed that our javadoc task currently fails due to the yetus api compatability annotations we have. The yetus annotations target jdk 7.

While I don't think this is too much of a problem since it should not impact ZooKeeper operation under jdk 6 we should definitely avoid silent failures.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years ago 0|i3ro07:
ZooKeeper ZOOKEEPER-3002

Upgrade branches 3.5 and trunk to Java 1.8

Task Resolved Major Fixed Norbert Kalmár Andor Molnar Andor Molnar 22/Mar/18 13:55   17/Apr/18 00:58 17/Apr/18 00:57 3.5.4, 3.6.0 3.5.4, 3.6.0 java client, server   0 3   We upgrade the minimum required Java version to compile and run ZooKeeper on 3.5 and master branches to Java 1.8. java1.8, upgrade 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 48 weeks, 2 days ago 0|i3rnvr:
ZooKeeper ZOOKEEPER-3001

Incorrect log message when try to delete container node

Bug Resolved Trivial Fixed selfish finch selfish finch selfish finch 18/Mar/18 10:13   25/Mar/18 22:39 25/Mar/18 21:49 3.5.3 3.5.4, 3.6.0 server   0 4   The log message when trying to delete a container node is not proper, missing
*_String.format_*
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 51 weeks, 3 days ago 0|i3rglr:
ZooKeeper ZOOKEEPER-3000

Use error-prone compiler

Improvement Open Major Unresolved Roman Leventov Roman Leventov Roman Leventov 17/Mar/18 13:21   05/Feb/20 07:16   3.5.4, 3.6.0 3.7.0, 3.5.8     0 2 0 600   See http://errorprone.info/ 100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks, 3 days ago 0|i3rg4f:
ZooKeeper ZOOKEEPER-2999

CMake build should use target-level commands

Improvement Resolved Minor Fixed Andrew Schwartzmeyer Andrew Schwartzmeyer Andrew Schwartzmeyer 09/Mar/18 16:29   23/Apr/18 20:28 16/Apr/18 00:30 3.5.4, 3.6.0 3.5.4, 3.6.0     0 2   Originally suggested in [GitHub PR #386|https://github.com/apache/zookeeper/pull/386], the CMake build I wrote used {{include_directories}}, which has global side effects, instead of the more explicit {{target_include_directories}}, to include directories per target (and with private or public scoping).

Furthermore, it should also use {{CMAKE_CURRENT_SOURCE_DIR}} over {{CMAKE_SOURCE_DIR}} in order to allow inclusion in other projects via {{add_subdirectory()}}, and we can reduce the minimally required CMake version to 3.5 from 3.6.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 48 weeks, 3 days ago 0|i3r49r:
ZooKeeper ZOOKEEPER-2998

CMake declares incorrect ZooKeeper version

Bug Resolved Minor Fixed Andrew Schwartzmeyer Andrew Schwartzmeyer Andrew Schwartzmeyer 09/Mar/18 16:05   25/Mar/18 23:37 25/Mar/18 22:17 3.6.0 3.6.0     0 4   The \{{CMakeLists.txt}} file in the master branch declares version 3.5.3 instead of 3.6.0. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 51 weeks, 3 days ago 0|i3r49b:
ZooKeeper ZOOKEEPER-2997

CMake should not force static CRT linking

Bug Resolved Major Fixed Andrew Schwartzmeyer Andrew Schwartzmeyer Andrew Schwartzmeyer 09/Mar/18 16:00   26/Mar/18 02:02 25/Mar/18 22:19   3.5.4, 3.6.0     0 4   Windows When writing the CMake build, I erroneously forced ZooKeeper to link to the Windows CRT statically. Instead of setting this, we should rely on CMake's defaults, and let users override it if they choose to by configuring with  setting {{CMAKE_CXX_ARGS}}. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 51 weeks, 3 days ago 0|i3r48n:
ZooKeeper ZOOKEEPER-2996

core in netlink with "Unexpected error 9 on netlink descriptor 20" error

Bug Open Major Unresolved Unassigned prashantkumar prashantkumar 09/Mar/18 00:28   09/Mar/18 00:30           0 1   I see below core with 3.5.1-alpha

{code:java}

#0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/raise.c:58
58 /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (LWP 20486)]
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/raise.c:58
#1 0x00007f39f9f439a1 in __GI_abort () at /usr/src/debug/glibc/2.24-r0/git/stdlib/abort.c:89
#2 0x00007f39f9f81ac0 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f39fa078959 "%s") at /usr/src/debug/glibc/2.24-r0/git/sysdeps/posix/libc_fatal.c:175
#3 0x00007f39f9f81b0a in __GI___libc_fatal (message=0x7f39e68c3350 "Unexpected error 9 on netlink descriptor 20") at /usr/src/debug/glibc/2.24-r0/git/sysdeps/posix/libc_fatal.c:185
#4 0x00007f39fa019315 in __GI___netlink_assert_response (fd=fd@entry=20, result=<optimized out>) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/netlink_assert_response.c:103
#5 0x00007f39fa0189f2 in make_request (pid=<optimized out>, fd=<optimized out>) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/check_pf.c:171
#6 __check_pf (seen_ipv4=seen_ipv4@entry=0x7f39e68c4642, seen_ipv6=seen_ipv6@entry=0x7f39e68c4643, in6ai=in6ai@entry=0x7f39e68c4650, in6ailen=in6ailen@entry=0x7f39e68c4658) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/check_pf.c:329
#7 0x00007f39f9fe9679 in __GI_getaddrinfo (name=<optimized out>, name@entry=0x7f39e560d2a0 "128.0.0.4", service=service@entry=0x7f39e560d2aa "2181", hints=hints@entry=0x7f39e68c4b60, pai=pai@entry=0x7f39e68c4b38) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/posix/getaddrinfo.c:2338
#8 0x00007f39f5d33ca5 in resolve_hosts (avec=0x7f39e68c4b40, hosts_in=0x7f39e560d250 "128.0.0.4:2181", zh=0x7f39e8756000) at /usr/src/debug/zookeeper/3.5.1-alpha-r0/zookeeper-3.5.1-alpha/src/c/src/zookeeper.c:723
#9 update_addrs (zh=zh@entry=0x7f39e8756000) at /usr/src/debug/zookeeper/3.5.1-alpha-r0/zookeeper-3.5.1-alpha/src/c/src/zookeeper.c:862
#10 0x00007f39f5d36611 in zookeeper_interest (zh=zh@entry=0x7f39e8756000, fd=fd@entry=0x7f39e68c4ce8, interest=interest@entry=0x7f39e68c4cec, tv=tv@entry=0x7f39e68c4d00) at /usr/src/debug/zookeeper/3.5.1-alpha-r0/zookeeper-3.5.1-alpha/src/c/src/zookeeper.c:2167
#11 0x00007f39f5d42ca8 in do_io (v=0x7f39e8756000) at /usr/src/debug/zookeeper/3.5.1-alpha-r0/zookeeper-3.5.1-alpha/src/c/src/mt_adaptor.c:380
#12 0x00007f3a00967490 in start_thread (arg=0x7f39e68eb700) at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456
#13 0x00007f39f9ffc41f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

{code}

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 1 week, 6 days ago 0|i3r35z:
ZooKeeper ZOOKEEPER-2995

ant docs fails when Java 1.9 is present on my system

Bug Open Major Unresolved Unassigned Abraham Fine Abraham Fine 08/Mar/18 00:31   08/Mar/18 00:32   3.5.3, 3.4.11, 3.6.0       0 2   When attempting to compile the documentation (with JAVA_HOME set to 1.7) I see output like this:
{code}
$ ant clean
docs -Dforrest.home=$(brew info apache-forrest | grep /Cellar | awk '{print $1;}') -d
Apache Ant(TM) version 1.9.7 compiled on April 9 2016
Trying the default build file: build.xml
Buildfile: REDACTED/zookeeper/build.xml
Adding reference: ant.PropertyHelper
Detected Java version: 1.7 in: /Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre

OTHER STUFF

docs:
Class org.apache.tools.ant.taskdefs.condition.Os loaded from parent loader (parentFirst)
Condition false; setting forrest.exec to forrest
Setting project property: forrest.exec -> forrest
[exec] Current OS is Mac OS X
[exec] Executing '/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
[exec] The ' characters around the executable and arguments are
[exec] not part of the command.
Execute:Java13CommandLauncher: Executing '/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
The ' characters around the executable and arguments are
not part of the command.
[exec] Apache Forrest. Run 'forrest -projecthelp' to list options
[exec]
[exec] Buildfile: /usr/local/Cellar/apache-forrest/0.9/libexec/main/forrest.build.xml
[exec]
[exec] check-java-version:
[exec] This is apache-forrest-0.9
[exec] Using Java 1.6 from /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home

MORE STUFF

[exec]
[exec] BUILD FAILED
[exec] /usr/local/Cellar/apache-forrest/0.9/libexec/main/targets/site.xml:180: Warning: Could not find file REDACTED/zookeeper/src/docs/build/tmp/brokenlinks.xml to copy.
[exec]
[exec] Total time: 3 seconds
[exec] -Djava.endorsed.dirs=/usr/local/Cellar/apache-forrest/0.9/libexec/lib/endorsed:${java.endorsed.dirs} is not supported. Endorsed standards and standalone APIs
[exec] Error: Could not create the Java Virtual Machine.
[exec] in modular form will be supported via the concept of upgradeable modules.
[exec] Error: A fatal exception has occurred. Program will exit.
[exec]
[exec] Copying broken links file to site root.
[exec]

BUILD FAILED
REDACTED/zookeeper/build.xml:501: exec returned: 1
at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:644)
at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:670)
at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:496)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:435)
at org.apache.tools.ant.Target.performTasks(Target.java:456)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
at org.apache.tools.ant.Project.executeTarget(Project.java:1376)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
at org.apache.tools.ant.Main.runBuild(Main.java:854)
at org.apache.tools.ant.Main.startAnt(Main.java:236)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:285)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:112)
{code}

The build succeeds when I uninstall java 9.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 2 weeks ago 0|i3r0yn:
ZooKeeper ZOOKEEPER-2994

Tool required to recover log and snapshot entries with CRC errors

New Feature Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 07/Mar/18 10:31   17/Jul/18 00:49 23/Apr/18 18:26 3.5.4, 3.6.0, 3.4.13 3.5.4, 3.6.0, 3.4.13     0 3   In the even that the zookeeper transaction log or snapshot become corrupted and fail CRC checks (preventing startup) we should have a mechanism to get the cluster running again.

Previously we achieved this by loading the broken transaction log with a modified version of ZK with disabled CRC check and forced it to snapshot.

It'd very handy to have a tool which can do this for us. LogFormatter and SnapshotFormatter have already been designed to dump log and snapshot files, it'd be nice to extend their functionality and add ability for such recovery.

It has proven that once you end up with the corrupt txn log there is no way to recover except manually modifying the crc check. That's basically why the tool is needed.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks ago 0|i3qzs7:
ZooKeeper ZOOKEEPER-2993

.ignore file prevents adding src/java/main/org/apache/jute/compiler/generated dir to git repo

Bug Closed Minor Fixed jason wang jason wang jason wang 05/Mar/18 14:48   17/Jul/18 00:49 22/May/18 23:44 3.4.10 3.6.0, 3.4.13, 3.5.5 build   0 5 0 1200   There are Rcc.java and other required files under the src/java/main/org/apache/jute/compiler/generated directory.

However, when I tried to add the source distribution to our own git repo, the .gitignore file has "generated" as a key word in line 55 - which prevents the dir and files under that dir to be added to the repo.  The compilation later fails due to the missing dir and files.

*compile_jute*
:*19:02:54* [mkdir] Created dir: /home/jenkins/workspace/3PA/PMODS/zookeeper-pgdi-patch-in-maven-repo/src/java/generated*

19:02:54* [mkdir] Created dir: /home/jenkins/workspace/3PA/PMODS/zookeeper-pgdi-patch-in-maven-repo/src/c/generated*

19:02:54* [java] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8

*19:02:54* [java] Error: Could not find or load main class org.apache.jute.compiler.generated.Rcc*

19:02:54* [java] Java Result: 1*19:02:54* [java] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8

*19:02:54* [java] Error: Could not find or load main class org.apache.jute.compiler.generated.Rcc*

19:02:54* [java] Java Result: 1*19:02:54* [touch] Creating /home/jenkins/workspace/3PA/PMODS/zookeeper-pgdi-patch-in-maven-repo/src/java/generated/.generated*

 

Fix is to remove or comment out the generated key word in line 55.

#
#generated
#

 
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks, 1 day ago 0|i3qw0f:
ZooKeeper ZOOKEEPER-2992

The eclipse build target fails due to protocol redirection: http->https

Bug Resolved Major Fixed Shawn Heisey Shawn Heisey Shawn Heisey 04/Mar/18 11:51   05/Mar/18 03:34 04/Mar/18 21:52 3.5.3, 3.4.11 3.5.4, 3.6.0, 3.4.12 build   0 4   The eclipse build target downloads a component from sourceforge. It does this download with http, but sourceforge now requires https downloads. The sourceforge page redirects to https, but ant is refusing to follow the redirect because it changes protocol.

The download in build.xml just needs to be changed to https and it will work.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 2 weeks, 3 days ago 0|i3qucv:
ZooKeeper ZOOKEEPER-2991

Server logs error on shutdown

Bug Open Major Unresolved Unassigned Paul Millar Paul Millar 02/Mar/18 05:45   07/May/18 03:57   3.4.10, 3.4.11   server   0 1   Commit d497aac4 introduced the ZooKeeperServer#registerServerShutdownHandler method and corresponding ZooKeeperServerShutdownHandler class.  Both the method and class are package-protected, resulting in the expectation that non-ZK code should not use either.

However, if registerServerShutdownHandler is *not* called, then ZK will log an error:
{quote}ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes
{quote}
There are several problems here.  In order of importance (for me, at least!)

First, (most important) this certainly should not be logged as an error.  Depending on usage, there may be no need for a shutdown handler.  Always logging an error (with no opportunity to silence it) is therefore wrong.

Second, the ability to learn of state changes may be of general interest (monitoring, etc); however, this is not possible if the method is protected.

Third, the method accepts a concrete class that is designed to use a CountDownLatch. This is not appropriate in all cases.  The method should be updated to accept an interface.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks, 3 days ago 0|i3qsgn:
ZooKeeper ZOOKEEPER-2990

Implement probabilistic tracing

Improvement Open Minor Unresolved Bogdan Kanivets Bogdan Kanivets Bogdan Kanivets 02/Mar/18 02:30   04/Oct/19 10:55       server   0 2 0 1800   It would be nice to have an ability to do probabilistic tracing similar to Cassandra  [nodetool|https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/nodetool/toolsSetTraceProbability.html]

This will help debug issues in prod systems.

I'd like to contribute if everyone is ok with the feature.

My suggestion is to add an extra parameter to ZooTrace to handle it. Questions:
* should it be one global param or per each ZooTrace mask? I'm thinking per mask
* should it be a new 4lw or part of 'stmk'? Leaning towards new word and refactoring param passing to words (stmk is a special case right now).
* there are places in the code that use LOG.trace directly. That will have to change to ZooTrace

I can make some initial implementation for demo/review.

 
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 27 weeks, 1 day ago 0|i3qs5r:
ZooKeeper ZOOKEEPER-2989

IPv6 literal address causes problems for Quorum members

Bug Resolved Major Duplicate Unassigned Rick Trudeau Rick Trudeau 01/Mar/18 15:34   27/Aug/19 23:41 27/Aug/19 23:41 3.5.3   quorum   0 2 0 7800   We're using ZK 3.5.3-beta.

When using literal IPv6 addresses in the zoo.cfg.dynamic file, ZK fails to come up with the connection to the peer ZKs keeps getting reset.

zookeeper.log indicates a badly formed address is the cause.
{noformat}
<2018.03.01 15:14:30 163 -0500><E><sdn3></2001:db8:0:0:0:0:0:4:3888><org.apache.zookeeper.server.quorum.QuorumCnxManager> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException: Badly formed address: 2001:db8:0:0:0:0:0:2:3888{noformat}
Our zoo.cfg.dynamic uses literal IPv6 addresses which according to ZOOKEEPER-1460 is supported.
{noformat}
server.1=[2001:db8::2]:2888:3888
server.2=[2001:db8::3]:2888:3888
server.3=[2001:db8::4]:2888:3888{noformat}
 

Digging into QuorumCnxManager.java, InitialMessage.parse attemps to seperate the host portion from the port portion using ":" as a delimeter, which is a problem for IPv6 IPs.  And there's this comment:
{code:java}
// FIXME: IPv6 is not supported. Using something like Guava's HostAndPort
// parser would be good.{code}
So it looks like peers address:port is failing to be parsed if they are specified as literal IPv6 addresses.  To confirm a workaround, I replaced my zoo.cfg.dynamic with hostnames instead, and everything worked as expected.

 

 

 

 
100% 100% 7800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 42 weeks, 5 days ago 0|i3qrfr:
ZooKeeper ZOOKEEPER-2988

NPE triggered if server receives a vote for a server id not in their voting view

Bug Closed Minor Fixed Brian Nixon Brian Nixon Brian Nixon 01/Mar/18 15:18   17/Jul/18 00:49 30/Apr/18 00:35 3.5.3, 3.4.11, 3.4.12 3.5.4, 3.6.0, 3.4.13 leaderElection   0 4   We've observed the following behavior in elections when a node is lagging behind the quorum in its view of the ensemble topology.

- Node A is operating with node B in its voting view, but without view of node C.

- B votes for C.

- A then switches its vote to C, but throws a NPE when attempting to connect.

This causes the QuorumPeer to spin up a Follower only to immediately have it shutdown by the exception.

Ideally, A would not advertise a vote for a server that it will not follow.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 46 weeks, 1 day ago 0|i3qrfb:
ZooKeeper ZOOKEEPER-2987

Invalid PurgeTxnLog params order in the zkCleanup.sh

Bug Open Major Unresolved Unassigned Maciej Lopacinski Maciej Lopacinski 28/Feb/18 17:22   28/Feb/18 17:23           0 2   The order of params passed to the PurgeTxnLog via the zkCleanup.sh script is invalid.

 

See PR for details.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 3 weeks, 1 day ago 475 https://github.com/apache/zookeeper/pull/475 0|i3qpwf:
ZooKeeper ZOOKEEPER-2986

My id not in the peer list

Bug Open Major Unresolved Unassigned Mohammad Etemad Mohammad Etemad 22/Feb/18 14:25   28/Aug/19 22:25   3.5.3       0 2   Running in a docker container in kubernetes 1.5 Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  "My id 1 not in the peer list". If I use the alpha version (3.5.2) and then upgrade to the 3.5.3 beta version, the problem goes away. But if I implement the 3.5.3 version directly, the clustering never  happens and I get the error. To give you a bit more overview of the implementation:
 
The pods use a persistent volume claim on a gluster volume. Each pod is assigned its own volume on the gluster file system. I run zookeeper as a stateful set with 3 pods. 
 
In my cfg file I have:
 
{code:java}
standaloneEnabled=false
tickTime=2000
initLimit=10
syncLimit=5
#snapshot file dir
dataDir=/data
#tran log dir
dataLogDir=/dataLog
#zk log dir
logDir=/logs
4lw.commands.whitelist=*
dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
  
and in my cfg.dynamic file I have:
  
{code:java}
server.0=zookeeper-0:2888:3888
server.1=zookeeper-1:2888:3888
server.2=zookeeper-2:2888:3888{code}
  
Has there been any change on the clustering side of things that makes the new version not work?
Sample logs:
{code:java}
2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 1 not in the peer list
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79){code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
29 weeks ago 0|i3qhlz:
ZooKeeper ZOOKEEPER-2985

Expired session may unexpired after leader failover

Bug Open Major Unresolved Unassigned Chris Thunes Chris Thunes 22/Feb/18 11:59   26/Jul/19 08:13   3.5.3, 3.4.11       2 21   We recently observed an inconsistency in our Kafka cluster which we tracked down to ZooKeeper sessions expiring and then re-appearing after a ZooKeeper leadership failover. The Kafka nodes received session "Expired" events, leading to them starting new sessions and attempting to re-create some ephemeral nodes (broker ID nodes in kafka/brokers/ids specifically). However, between receiving the session Expired event and establishing a new session a leadership failover occurred within the ZooKeeper cluster which resulted in the expired session re-appearing. When Kafka attempted to re-create the ephemeral nodes mentioned above it (unexpectedly) received NODEEXISTS errors.

This behavior is a result of how session expiration is handled by the leader. Specifically, the expired session is marked as "closing" immediately upon expiration (in SessionTrackerImpl) and _before_ the corresponding "closeSession" entry is committed. A client can therefore receive a session Expired event before its session is fully closed. A leadership failover which results in the loss of the (uncommitted) closeSession entry thus leads to the sessions' ephemeral nodes "re-appearing" until another expiration of the old session on the new leader takes place.

I'm not certain if this should be considered a bug or an edge case that client are expected to handle. If it is the latter then I think it would be good to include this in the Programmer's Guide in the documentation.

If it's helpful I have code to reproduce this on an in-process cluster running 3.4.11 or 3.5.3-beta.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
33 weeks, 6 days ago 0|i3qhav:
ZooKeeper ZOOKEEPER-2984

Master

Bug Open Major Unresolved Unassigned Yayan Sinchan Yayan Sinchan 21/Feb/18 03:21   27/Aug/19 10:07           0 3   h2.   9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Patch, Important
29 weeks, 2 days ago
Incompatible change, Reviewed
0|i3qerb:
ZooKeeper ZOOKEEPER-2983

Print the classpath when running compile and test ant targets

Improvement Resolved Major Won't Fix Mark Fenes Mark Fenes Mark Fenes 20/Feb/18 08:43   27/Jun/18 10:40 27/Jun/18 10:40 3.5.3, 3.4.11   build   0 2   Printing the classpath helps to verify that we have only the intended classes, jars on the classpath, e.g. clover.jar is included only when running coverage tests.

 

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks, 1 day ago 0|i3qdnz:
ZooKeeper ZOOKEEPER-2982

Re-try DNS hostname -> IP resolution

Bug Resolved Blocker Fixed Flavio Paiva Junqueira Eron Wright Eron Wright 19/Feb/18 14:28   31/Dec/19 14:46 08/May/18 18:57 3.5.0, 3.5.1, 3.5.3 3.5.4, 3.6.0 server   0 8   ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4. Some portions of the fix haven't yet been ported to 3.5.

To recap the outstanding problem in 3.5, if a given ZK server is started before all peer addresses are resolvable, that server may cache a negative lookup result and forever fail to resolve the address. For example, deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) may fail because the DNS records are created lazily.

{code}
2018-02-18 09:11:22,583 [myid:0] - WARN [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95] - Exception when following the leader
java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
{code}

In the above example, the address `zk-2.zk.default.svc.cluster.local` was not resolvable when the server started, but became resolvable shortly thereafter. The server should eventually succeed but doesn't.
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
1 year, 45 weeks, 1 day ago 0|i3qcrj:
ZooKeeper ZOOKEEPER-2981

ZOOKEEPER-2933 Fix build on branch-3.5 for ZOOKEEPER-2939

Sub-task Resolved Major Fixed Andor Molnar Andor Molnar Andor Molnar 14/Feb/18 05:54   14/Feb/18 15:16 14/Feb/18 15:07 3.5.4 3.5.4 build   0 2   The commit (5ae5f1076e56947db5694ff8ab06c3d0b4f5d802) which has been cherry-picked from master for ZOOKEEPER-2939 caused build failure. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 5 weeks, 1 day ago 0|i3q6bz:
ZooKeeper ZOOKEEPER-2980

ZOOKEEPER-2933 Backport ZOOKEEPER-2939 Deal with maxbuffer as it relates to proposals

Sub-task Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 12/Feb/18 10:11   17/Jul/18 00:50 11/May/18 16:57   3.4.13 server   0 3 0 1800   Backport ZOOKEEPER-2939 to branch-3.4. 100% 100% 1800 0 1, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 44 weeks, 6 days ago 1
Incompatible change
1 0|i3q2q7:
ZooKeeper ZOOKEEPER-2979

ZOOKEEPER-2933 Use dropwizard library histogram for proposal-related metrics

Sub-task Resolved Major Won't Fix Andor Molnar Andor Molnar Andor Molnar 12/Feb/18 10:09   08/Oct/18 08:53 02/Oct/18 05:38     server   0 3 0 600   This Jira in intended to be the successor of ZOOKEEPER-2939.

By using dropwizard library's Histogram component we'll be able to provide more sophisticated statistics on Proposal sizes.

From the docs:
"A histogram measures the statistical distribution of values in a stream of data. In addition to minimum, maximum, mean, etc., it also measures median, 75th, 90th, 95th, 98th, 99th, and 99.9th percentiles."

[http://metrics.dropwizard.io/3.1.0/manual/core/#histograms]
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 24 weeks, 2 days ago 0|i3q2pz:
ZooKeeper ZOOKEEPER-2978

fix potential null pointer exception when deleting node

Bug Resolved Trivial Fixed Unassigned achimbab achimbab 12/Feb/18 08:57   20/Feb/18 18:35 20/Feb/18 17:38 3.4.11 3.5.4, 3.6.0, 3.4.12 java client   0 4   At line 518, 'existWatches.remove(clientPath)' is null because watches for clientPath is already removed.

https://github.com/apache/zookeeper/pull/461/commits/a6044af23ae1096a8c5305633320fa139cf730b2

 
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
2 years, 4 weeks, 2 days ago 0|i3q2lb:
ZooKeeper ZOOKEEPER-2977

Concurrency for addAuth corrupts quorum packets

Bug Patch Available Critical Unresolved sumit agrawal sumit agrawal sumit agrawal 11/Feb/18 22:40   04/Oct/19 10:55   3.4.9   quorum   1 5 0 4800   Affects all version in 3.4.x When client performs multiple times addAuth with different credential at follower concurrently, the communication between follower gets corrupt. This causes shutdown of Follower due to the failure.

Analysis:

In org.apache.zookeeper.server.quorum.QuorumPacket.serialize method,
* call a_.startVector(authinfo,"authinfo"); which write the length of authinfo to packet (suppose it writes length 1)
* get length of authinfo to write all details in loop (here gets length as 2)

<-- Here in concurrency scenario, buffer gets corrupt having extra bytes in channel for additional authinfo.

 

So When Leader reads next quorum packet, it reads previous extra bytes (incorrect) and possibly identify greater size of message (as corrupt byte pattern) causes exception...

Coordination > Unexpected exception causing shutdown while sock still open (LearnerHandler.java:633)
java.io.IOException: Unreasonable length = 1885430131

 

 

ServerCnxn.getAuthInfo returns Unmodifiable list, but while addAuthInfo, there is no check. So this causes concurrency issue.

 

 

 

 
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 27 weeks, 6 days ago Quorum, shutdown 0|i3q20f:
ZooKeeper ZOOKEEPER-2976

What is the latest stable version?

Bug Resolved Major Done Unassigned ilovezfs ilovezfs 03/Feb/18 13:01   21/Feb/18 07:25 21/Feb/18 07:25         0 2   https://www.apache.org/dyn/closer.cgi?path=zookeeper/stable

 

shows 3.4.10. But 3.4.11 is shown on http://zookeeper.apache.org/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 6 weeks, 5 days ago 0|i3prkf:
ZooKeeper ZOOKEEPER-2975

Re connection should happen without any problem After zk server restarted

Bug Open Major Unresolved Unassigned Marimuthu PMS Dhavamani Marimuthu PMS Dhavamani 02/Feb/18 05:27   02/Feb/18 05:43   3.4.9   server   0 1   Zookeeper 3.4.9

Zookeeper Server ---------------- Zookeeper Client (Both are Running in same machine) 

 

 
1)      Create session from zk Client (Self client)
2)      Stop the zkServer where still the zk Client connected
3)      Wait for the socket to be cleared in server side
èServer side TCP session should be removed from TIME_WAIT status.
4)      Start the zkServer
5)      Now, re connection from ZK Client is denied ..Please analyse... Re connection should happen without any problem... 
9223372036854775807 1)      Create session from zk Client (Self client)
2)      Stop the zkServer where still the zk Client connected
3)      Wait for the socket to be cleared in server side
èServer side TCP session should be removed from TIME_WAIT status.
4)      Start the zkServer
5)      Now, reconnection from zkClient is denied  with below error.. .. Re connection should happen without any problem... 


2018-02-01 13:01:26 [UTC:20180201T130126+0800]|INFO ||NIOServerCxn.Factory:/10.18.14.188:2181hread|Coordination > Accepted socket connection from /10.18.14.188:51281 (NIOServerCnxnFactory.java:210)
2018-02-01 13:01:26 [UTC:20180201T130126+0800]|INFO ||NIOServerCxn.Factory:/10.18.14.188:2181hread|Coordination > Client attempting to renew ClientSeqID 0x1614faa91d90002 at /10.18.14.188:51281 (ZooKeeperServer.java:968)
2018-02-01 13:01:26 [UTC:20180201T130126+0800]|INFO ||NIOServerCxn.Factory:/10.18.14.188:2181hread|Coordination > Invalid ClientSeqID 0x1614faa91d90002 for client /10.18.14.188:51281, probably expired (ZooKeeperServer.java:687)
2018-02-01 13:01:26 [UTC:20180201T130126+0800]|INFO ||NIOServerCxn.Factory:/10.18.14.188:2181hread|Coordination > Closed socket connection for client /10.18.14.188:51281 which had ClientSeqID 0x1614faa91d90002 (NIOServerCnxn.java:1041)
2018-02-01 13:01:26 [UTC:20180201T130126+0800]|INFO ||NIOServerCxn.Factory:/10.18.14.188:2181hread|Coordination > Accepted socket connection from /10.18.14.188:51282 (NIOServerCnxnFactory.java:210)
No Perforce job exists for this issue. 0 9223372036854775807
Important
2 years, 6 weeks, 6 days ago 0|i3ppgv:
ZooKeeper ZOOKEEPER-2974

Link invalid, please update http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html

Bug Open Major Unresolved Unassigned Ethan Wang Ethan Wang 01/Feb/18 19:42   01/Feb/18 19:42           0 1   Saw this link at [https://curator.apache.org/getting-started.html]

 
Curator users are assumed to know ZooKeeper. A good place to start is here: [http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html] 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 6 weeks, 6 days ago 0|i3poz3:
ZooKeeper ZOOKEEPER-2973

"Unreasonable length" exception

Bug Open Blocker Unresolved Unassigned wanggang_123 wanggang_123 31/Jan/18 03:45   16/Jun/19 07:07   3.4.6       1 4   I am running a three node ZooKeeper cluster. At 2018-01-28 17:56:30,leader node has error log:

2018-01-28 17:56:30 [UTC:20180128T175630+0800]|ERROR||LearnerHandler-/118.123.180.23:44836hread|Coordination > Unexpected exception causing shutdown while sock still open (LearnerHandler.java:633)
java.io.IOException: Unreasonable length = 1885430131
 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
 at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
 at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
2018-01-28 17:56:30 [UTC:20180128T175630+0800]|WARN ||LearnerHandler-/118.123.180.23:44836hread|Coordination > ******* GOODBYE /118.123.180.23:44836 ******** (LearnerHandler.java:646)
2018-01-28 17:56:30 [UTC:20180128T175630+0800]|INFO ||ProcessThread(sid:2 cport:-1):hread|Coordination > Got user-level KeeperException when processing sessionid:0x16138593ad43cf9 type:delete cxid:0x5 zxid:0xc104b59e9 txntype:-1 reqpath:n/a Error Path:/VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-0000093037 Error:KeeperErrorCode = NoNode for /VSP/Leader/syncScore-0/_c_9101a3d6-f431-4792-b71d-a493e938895d-latch-0000093037 (PrepRequestProcessor.java:645)
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
39 weeks, 4 days ago 0|i3pl2n:
ZooKeeper ZOOKEEPER-2972

When use SSL on zookeeper server, counts of watches may increase more than forty thousands and lead zoookeeper process outofmemory error

Bug Open Major Unresolved Unassigned wuyiyun wuyiyun 29/Jan/18 03:16   31/Jan/18 20:09   3.5.3   recipes   0 1   I deploy a zookeeper cluster on three nodes. And enable ssl capability under below guidline:[https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide]

And i use zookeeper client which also enable ssl capability to connect this zookeeper server and set same data to two node under below demo:

CuratorFramework client;

// each time we instance a new zookeeper client ....

String path1, path2;

// instance path1, path2 ......

String status = "ok"

client.client.setData().forPath(path1,status.getBytes());

client.client.setData().forPath(path2,status.getBytes());

// close zookeeper client.......

This function will be called each five seconds and it work good while ssl capability disabled. when ssl capability enabled, zookeeper server run about one day, and an outofmemory error will occurred and auto produce java_pidXXX.hprof file by zookeeper process . i use Eclipse Memory Analyzer to analize the hprof file and found instance of DataTree used more than six handreds MB memory and more than eighty seven percent memory used by dataTree's field which name is dataWatches. And i use four letter command to check and found too many watches on all of this three nodes. I guess those too many watches cause the error  but i don't know why there are so many watches!

Additional, if disabled the ssl capability. use four letter command and can only found there are several  watches on each node. and count of watches will not increased.

 

Each zookeeper node run under VM which has eight core and eight GB memory, and it's os are centos6.5/centos7.3/redhat6.5/redhat7 and run zookeeper and this demo with JDK1.8.

This issue will happened under zookeeper 3.5.1 and 3.5.2 and 3.5.3. 

 

 

 

.......
When use SSL on zookeeper server, counts of watches may increase more than forty thousands and lead zoookeeper process outofmemroy error after zookeeper server started one day.

check command:

echo wchs | nc localhost 2181

check result:

[zookeeper@localhost bin]$ echo wchs | nc localhost 2181
44412 connections watching 1 paths
Total watches:44412
features 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 7 weeks, 3 days ago 0|i3ph3r:
ZooKeeper ZOOKEEPER-2971

Create release notes for 3.5.4

Improvement Resolved Blocker Fixed Patrick D. Hunt Jordan Zimmerman Jordan Zimmerman 28/Jan/18 12:31   10/May/18 17:28 10/May/18 17:28 3.5.3 3.5.4 documentation   0 1   ZOOKEEPER-2901 and ZOOKEEPER-2903 fix a serious bug with TTL nodes in 3.5.3. The release notes for 3.5.4 should describe the problem and how it was worked-around/fixed. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks ago 0|i3pgov:
ZooKeeper ZOOKEEPER-2970

ZOOKEEPER-3170 Flaky Test: testNullQuorumAuthServerWithValidQuorumAuthPacket

Sub-task Resolved Major Cannot Reproduce Andor Molnar Mark Fenes Mark Fenes 25/Jan/18 09:06   25/Oct/18 11:23 25/Oct/18 11:23 3.4.5       0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 21 weeks ago 0|i3pcvr:
ZooKeeper ZOOKEEPER-2969

C API Log Callback Lacks Context

Bug Open Minor Unresolved Unassigned Travis Gockel Travis Gockel 19/Jan/18 18:38   19/Jan/18 19:02   3.5.2, 3.5.3, 3.6.0       0 1   I have two zhandle_ts connected to two different ZK ensembles. Differentiating between log messages of the two is quite difficult, as the callback only gives you the message, with no reasonable way to grab connection that created it (the address of the handle is in the log message, but parsing this value seems rather error-prone). It would be nice if the log callback gave me the handle.
 

I attached a patch for a potential fix...it adds a few functions without breaking backwards compatibility:

 

{{typedef void (*log_callback_ext_fn)(const zhandle_t *zh,}}
{{    const void *log_context, ZooLogLevel level, const char *message);}}

{{ZOOAPI void zoo_get_log_callback_ext(const zhandle_t *zh,}}
{{    log_callback_ext_fn *callback, const void **context);}}

{{ZOOAPI void zoo_set_log_callback_ext(zhandle_t *zh,}}
{{    log_callback_ext_fn callback, const void *context);}}

{{ZOOAPI zhandle_t *zookeeper_init3(const char *host, watcher_fn fn,}}
{{  int recv_timeout, const clientid_t *clientid, void *context, int flags,}}
{{  log_callback_ext_fn log_callback, const void *log_callback_context);}}

 

The fallback ordering is changed to: log_callback_ext_fn -> log_callback_fn -> global stream.

Let me know if this is completely crazy.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 8 weeks, 6 days ago 0|i3p4zj:
ZooKeeper ZOOKEEPER-2968

Add C client code coverage tests

Test Closed Major Fixed Mark Fenes Mark Fenes Mark Fenes 18/Jan/18 11:30   17/Jul/18 00:49 31/May/18 15:58 3.5.3, 3.4.11 3.6.0, 3.4.13, 3.5.5 tests   0 3 0 1800   We have limited code coverage support in ZK. Similarly to Java code coverage tests add C client code coverage tests by using GCC tools like gcov and lcov. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 42 weeks ago 0|i3p2ov:
ZooKeeper ZOOKEEPER-2967

Add check to validate dataDir and dataLogDir parameters at startup

Improvement Resolved Major Fixed Mark Fenes Andor Molnar Andor Molnar 18/Jan/18 10:55   11/May/18 15:47 20/Feb/18 14:29 3.4.11 3.5.4, 3.6.0, 3.4.12 server   0 5   According to  -ZOOKEEPER-2960- we should at a startup check to validate that dataDir and dataLogDir parameters are set correctly.

Perhaps we should introduce a check of some kind? If datalogdir is different that datadir and snapshots exist in datalogdir we throw an exception and quit.
startup, validation 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Important
1 year, 44 weeks, 6 days ago 0|i3p2kv:
ZooKeeper ZOOKEEPER-2966

ZOOKEEPER-3170 Flaky NullPointerException while closing client connection

Sub-task Resolved Critical Cannot Reproduce Unassigned Enrico Olivelli Enrico Olivelli 17/Jan/18 03:04   28/Feb/19 08:20 28/Feb/19 08:20 3.5.3   java client   0 3   It is not always reproducible, I get this in system tests of client applications.

ZK client 3.5.3, stacktrace self-explains
{code:java}
java.lang.NullPointerException
    at org.apache.zookeeper.ClientCnxnSocketNetty.onClosing(ClientCnxnSocketNetty.java:206)
    at org.apache.zookeeper.ClientCnxn$SendThread.close(ClientCnxn.java:1395)
    at org.apache.zookeeper.ClientCnxn.disconnect(ClientCnxn.java:1440)
    at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1467)
    at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:1319){code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 3 weeks ago 0|i3ozyf:
ZooKeeper ZOOKEEPER-2965

prevent DNS queries spam

Improvement Open Minor Unresolved Unassigned Philippe Serreault Philippe Serreault 09/Jan/18 12:16   29/Aug/19 07:50       c client   0 2   Hello,

First of all, some context about the issue and why it became quite apparent to me:
* I'm using the native zookeeper client on linux
* I'm not declaring -DTHREADED
* My zookeeper ensemble is made of server names that need to be resolved
* The ensemble and DNS servers are "next" to each other
* My client is "far" and uses an unreliable network path that can drop UDP requests

For each run in client's main loop, all servers in ensemble are resolved, even if no change in servers list occurred (zookeeper_interest .. update_addrs .. resolve_hosts).
In my situation, DNS requests could timeout and would trigger a reconnection to ensemble.

Please find attached a patch that would prevent DNS queries when hostname was not changed.

Best regards,
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
29 weeks ago 0|i3op07:
ZooKeeper ZOOKEEPER-2964

"Conf" command returns dataDir and dataLogDir opposingly

Bug Resolved Minor Fixed Unassigned Qihong Xu Qihong Xu 08/Jan/18 01:51   04/Oct/19 10:55 18/Jan/18 19:13 3.5.3, 3.6.0 3.5.4, 3.6.0 server   0 7   I foung a bug that "conf" command would return dataDir and dataLogDir opposingly.

This bug only exists in versions newer than 3.5. I only found dumpConf in [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188] prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths are not affected and server function is ok.

I made a small patch to fix this bug. Any review is appreciated.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 9 weeks ago 0|i3omfr:
ZooKeeper ZOOKEEPER-2963

standalone

Bug Resolved Major Invalid maoling wu xiaoxue wu xiaoxue 04/Jan/18 04:28   31/Dec/18 11:39 17/Jan/18 19:01         0 3 0 1200   Today is China New Year's Day.I am still a single dog.
When reading this line code annotation, I burst into tear.
My New Year's Resolution is girlfriend(s)!!!!!!!!!!!!!!!!!!!!!!!
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 5 weeks, 1 day ago 0|i3oi2n:
ZooKeeper ZOOKEEPER-2962

The function queueEmpty() in FastLeaderElection.Messenger is not used, should be removed.

Improvement Resolved Minor Fixed Unassigned Jiafu Jiang Jiafu Jiang 26/Dec/17 02:50   19/Mar/18 23:45 08/Mar/18 12:52 3.4.11 3.4.12 leaderElection   0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 2 days ago 0|i3oa6n:
ZooKeeper ZOOKEEPER-2961

Fix testElectionFraud Flakyness

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 22/Dec/17 17:10   18/Jan/18 19:33 18/Jan/18 19:05 3.5.3, 3.4.11, 3.6.0 3.5.4, 3.6.0, 3.4.12     0 4   This test relies on hooking into our logging system and creates a new appender using a PatternLayout object shared with the CONSOLE appender. PatternLayout has some synchronization issues (https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html) so we should create a new instance of it. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 9 weeks ago 0|i3o8k7:
ZooKeeper ZOOKEEPER-2960

The dataDir and dataLogDir are used opposingly

Bug Resolved Critical Fixed Andor Molnar Dan Milon Dan Milon 22/Dec/17 12:17   04/Oct/19 10:55 08/Jan/18 04:40 3.4.11 3.4.12 server   0 9   Not relevant. _emphasized text_After upgrading from zookeeper 3.4.5, to 3.4.11, without editing {{zoo.cfg}}, the new version of the server tries to use the {{dataDir}} as the {{dataLogDir}}, and the {{dataLogDir}} as the {{dataDir}}. Or at least some parts of the server.

Configuration file has:
{noformat}
$ grep -i data /etc/zookeeper/zoo.cfg
dataLogDir=/var/lib/zookeeper/datalog
dataDir=/var/lib/zookeeper/data
{noformat}

But runtime configuration has:
{noformat}
$ echo conf | nc localhost 2181 | grep -i data
dataDir=/var/lib/zookeeper/datalog/version-2
dataLogDir=/var/lib/zookeeper/data/version-2
{noformat}

Also, I got this in the debug logs, so clearly some parts of the server confuse things.

{noformat}
[PurgeTask:FileTxnSnapLog@79] - Opening datadir:/var/lib/zookeeper/datalog snapDir:/var/lib/zookeeper/data
[main:FileTxnSnapLog@79] - Opening datadir:/var/lib/zookeeper/data snapDir:/var/lib/zookeeper/datalog
{noformat}

I tried to look in the code for wrong uses of the directories. I only found [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L227] is passing the arguments to {{FileTxnSnapLog}} in the wrong order, but the code comment says that this is legacy only for tests, so I assume it isn't the cause for my case.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 1 week, 3 days ago 0|i3o87z:
ZooKeeper ZOOKEEPER-2959

ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

Bug Closed Blocker Fixed Bogdan Kanivets xiangyq000 xiangyq000 20/Dec/17 01:46   19/Mar/19 20:26 10/May/18 00:03 3.4.10, 3.5.3 3.5.4, 3.6.0, 3.4.13     1 7 0 3600   Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch.

org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
{code:java}
private final HashSet<Long> connectingFollowers = new HashSet<Long>();
public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException {
synchronized(connectingFollowers) {
if (!waitingForNewEpoch) {
return epoch;
}
if (lastAcceptedEpoch >= epoch) {
epoch = lastAcceptedEpoch+1;
}
connectingFollowers.add(sid);
QuorumVerifier verifier = self.getQuorumVerifier();
if (connectingFollowers.contains(self.getId()) &&
verifier.containsQuorum(connectingFollowers)) {
waitingForNewEpoch = false;
self.setAcceptedEpoch(epoch);
connectingFollowers.notifyAll();
} else {
long start = Time.currentElapsedTime();
long cur = start;
long end = start + self.getInitLimit()*self.getTickTime();
while(waitingForNewEpoch && cur < end) {
connectingFollowers.wait(end - cur);
cur = Time.currentElapsedTime();
}
if (waitingForNewEpoch) {
throw new InterruptedException("Timeout while waiting for epoch from quorum");
}
}
return epoch;
}
}
{code}

The computation will get an outcome once :
# The leader has call method "getEpochToPropose"
# The number of all reporters is greater than half of participants.

The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants.

Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then QuorumVerifier#containsQuorum will return true, because it does not check whether the elements of the parameter are participants.

The same flaw exists in org.apache.zookeeper.server.quorum.Leader#waitForEpochAck
100% 100% 3600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 44 weeks, 3 days ago 0|i3o43b:
ZooKeeper ZOOKEEPER-2958

Don't reconnect zookeeper server when tomcat stopped

Improvement Open Major Unresolved Unassigned Zhaohui Yu Zhaohui Yu 18/Dec/17 03:49   18/Dec/17 03:49       java client   0 1   If run zookeeper client in tomcat:
1. create zookeeper connect to zookeeper server
2. connected zookeeper server
3. webapp stopped cause by other reason, so the WebappClassLoader in tomcat can't load new class.
4. run method in ClientCnxn.SendThread has a while loop catch all throwable, so the client will reconnect to the server, and then repeat these steps forever.

So, suggest give a StateChecker interface user can override it
{code:java}
public class ClientCnxn{
public class SendThread extend Thread{
public void run(){
while(stateChecker.check()){
}
}
}
}
{code}

So I can pass a StateChecker to check the tomcat WebappClassLoader state.

Thanks
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 13 weeks, 3 days ago 0|i3o0af:
ZooKeeper ZOOKEEPER-2957

Whether zookeeper client-server nio or netty communication can use socket proxies

Wish Open Major Unresolved Unassigned mugulong mugulong 15/Dec/17 04:03   05/Mar/18 05:33           0 4   Hello,:
I plan to use zookeeper for distributed consistency in one project, but the project requires proxies to be used for communication between the zookeeper servers and between the server and the client. After the actual test, zookeeper servers can communicate with each other normally through socks5 socket proxy , but zookeeper client and server can not communicate with each other normally through the same socks5 socket proxy, I find that the zookeeper client and server communicate with each other through java nio or netty, and java nio/netty is different from java io in the use of proxy.
so,whether zookeeper client-server nio or netty communication can use socket proxies?
Remarks:the zookeeper version I use is 3.4.6.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 2 weeks, 3 days ago 0|i3nxo7:
ZooKeeper ZOOKEEPER-2956

Cannot delete znode that owns too many child znodes by `rmr` command

Bug Open Major Unresolved Benedict Jin Benedict Jin Benedict Jin 15/Dec/17 03:43   19/Dec/19 10:28       server   1 6   We cannot delete znode that owns too many child znodes by `rmr` command, due to the list of child znodes could be 172 MB, which is too huge for the default value of `jute.maxbuffer` (1MB). In fact, we shouldn't be effected by the number of child znodes when we want to delete some special znodes recursively. 9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
13 weeks ago 0|i3nxn3:
ZooKeeper ZOOKEEPER-2955

Enable Clover code coverage report

Test Closed Major Fixed Mark Fenes Mark Fenes Mark Fenes 13/Dec/17 09:22   31/Jan/19 10:07 19/May/18 20:38 3.5.3, 3.4.11, 3.6.0 3.6.0, 3.4.13, 3.5.5 tests   0 4 0 7200   We have limited code coverage support in ZK. Clover for Java was running in the past but was turned off.
Enable Clover code coverage report to make us more confident on the quality, stability and compatibility of future ZK releases.
100% 100% 7200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks, 4 days ago
Reviewed
0|i3nuxz:
ZooKeeper ZOOKEEPER-2954

ant compile_jute is failing for zookeeper 3.4.11

Bug Open Major Unresolved Unassigned Aditya Pawaskar Aditya Pawaskar 13/Dec/17 05:26   18/Dec/17 00:14   3.4.11   build, jute   0 3   Operating system- Ubuntu 16.04
Platform- x86_64
when I run Apache Zookeeper 3.4.11 using OpenJDK-8 and clone source code from git.
At 'ant compile_jute' command I get following error message :

{noformat}
Buildfile: /root/zookeeper/build.xml

init:

jute:
[javac] Compiling 39 source files to /root/zookeeper/build/classes
[javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6
[javac] /root/zookeeper/src/java/main/org/apache/jute/Record.java:21: error: package org.apache.yetus.audience does not exist
[javac] import org.apache.yetus.audience.InterfaceAudience;
[javac] ^
[javac] /root/zookeeper/src/java/main/org/apache/jute/Record.java:29: error: package InterfaceAudience does not exist
[javac] @InterfaceAudience.Public
[javac] ^
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors
[javac] 1 warning

BUILD FAILED
/root/zookeeper/build.xml:315: Compile failed; see the compiler error output for details.
{noformat}

According to error, ant is unable to get InterfaceAudience which is part of audience-annotations-0.5.0.jar mentioned in build.xml
when I search for this jar file, I could not find it in source code.

Thanks and Regards,
Aditya
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 13 weeks, 3 days ago 0|i3nul3:
ZooKeeper ZOOKEEPER-2953

Flaky Test: testNoLogBeforeLeaderEstablishment

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 12/Dec/17 19:24   15/Dec/17 21:05 15/Dec/17 19:49 3.5.3, 3.4.11, 3.6.0 3.5.4, 3.6.0, 3.4.12     0 4   testNoLogBeforeLeaderEstablishment has been flaky on 3.4, 3.5, and master for quite awhile. My understanding is that the purpose of the test is to make sure that a server receives support from the quorum before changing the epoch and acting as leader.

There are a couple issues with the test in its current state. First, the assertions the test makes are not always true. It is possible, if the zookeeper database is not cleared, for a follower to be ahead of a leader when the quorum is shutdown. That follower will then likely become leader when the quorum is restarted. This is the cause of the flaky behavior. Second, the test does not appear to create the conditions it wants to test for. Since, ZOOKEEPER-335 (specifically the ZOOKEEPER-1081 subtask) we take the epoch into consideration in {{FastLeaderElection}} so the test no longer "believes it is the leader once it recovers".

After discussing the issue offline with [~phunt] we decided it would still be valuable to test the situation where a server is elected leader without the support of the quorum. So I removed {{testNoLogBeforeLeaderEstablishment}} and created a new test called {{testElectionFraud}}.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 13 weeks, 5 days ago
Reviewed
0|i3nu0f:
ZooKeeper ZOOKEEPER-2952

Upgrade third party libraries to address vulnerabilities

Improvement Resolved Critical Fixed Andor Molnar Andor Molnar Andor Molnar 12/Dec/17 05:38   14/Dec/17 02:26 13/Dec/17 17:02 3.5.3, 3.4.11, 3.6.0 3.5.4, 3.6.0, 3.4.12 server   0 3   Hi,

I'm going to upgrade the following third party libraries in order to address vulnerabilities found in them:

- io.netty:netty 3.10.5.Final -> 3.10.6.Final (CVE-2015-2156 (H), CVE-2014-3488 (H), protobuf: CVE-2015-5237 (H), npn-api: CVE-2017-9735 (H), CVE-1999-1198 (H), CVE-1999-1193 (H))
- org.slf4j:slf4j-api 1.7.5 -> 1.7.25
- log4j:log4j 1.2.16 -> 1.2.17

Please review the list and let me know if you have any concerns or would like to add more deps to upgrade.

Thanks,
Andor
dependencies, upgrade, vulnerabilities 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Important
2 years, 14 weeks ago 0|i3nsrb:
ZooKeeper ZOOKEEPER-2951

zkServer.cmd does not start when JAVA_HOME ends with a \

Bug Resolved Major Fixed Unassigned Jorg Heymans Jorg Heymans 11/Dec/17 03:51   12/Dec/17 14:31 12/Dec/17 13:40 3.4.11 3.5.4, 3.6.0, 3.4.12 server   0 5   windows 7 (not tested on other windows versions+ this is the output i get (apologies for the cut-off line endings)

{noformat}
C:\RC\Tools\zookeeper-3.4.11\bin>zkServer.cmd

call "c:\RC\jdk\jdk1.8.0_121\"\bin\java "-Dzookeeper.log.dir=C:\RC\Tools\zookeeper-3.4.11\bin\.." "-Dzookeeper.root.logger=INFO,CONSOLE" -cp "C:\RC\Tools\zookeeper-3.4.11\bin\..\build\classes;C:\RC\Tools\zookeeper-3.4.11\bin\..\build\lib\*;C:\RC\Tools\zookeeper-3.4.11\bin\..\*;C:\RC\Tools\zookeeper-
3.4.11\bin\..\lib\*;C:\RC\Tools\zookeeper-3.4.11\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "C:\RC\Tools\zookeeper-3.4.11\bin\..\c
onf\zoo.cfg"
Usage: java [-options] class [args...]
(to execute a class)
or java [-options] -jar jarfile [args...]
(to execute a jar file)
where options include:
-d32 use a 32-bit data model if available
-d64 use a 64-bit data model if available
-server to select the "server" VM
The default VM is server.

-cp <class search path of directories and zip/jar files>
-classpath <class search path of directories and zip/jar files>
A ; separated list of directories, JAR archives,
and ZIP archives to search for class files.
-D<name>=<value>
set a system property
-verbose:[class|gc|jni]
enable verbose output
-version print product version and exit
-version:<value>
Warning: this feature is deprecated and will be removed
in a future release.
require the specified version to run
-showversion print product version and continue
-jre-restrict-search | -no-jre-restrict-search
Warning: this feature is deprecated and will be removed
in a future release.
include/exclude user private JREs in the version search
-? -help print this help message
-X print help on non-standard options
-ea[:<packagename>...|:<classname>]
-enableassertions[:<packagename>...|:<classname>]
enable assertions with specified granularity
-da[:<packagename>...|:<classname>]
-disableassertions[:<packagename>...|:<classname>]
disable assertions with specified granularity
-esa | -enablesystemassertions
enable system assertions
-dsa | -disablesystemassertions
disable system assertions
-agentlib:<libname>[=<options>]
load native agent library <libname>, e.g. -agentlib:hprof
see also, -agentlib:jdwp=help and -agentlib:hprof=help
-agentpath:<pathname>[=<options>]
load native agent library by full pathname
-javaagent:<jarpath>[=<options>]
load Java programming language agent, see java.lang.instrument
-splash:<imagepath>
show splash screen with specified image
See http://www.oracle.com/technetwork/java/javase/documentation/index.html for more details.

endlocal

{noformat}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 14 weeks, 2 days ago 0|i3nr6v:
ZooKeeper ZOOKEEPER-2950

Add keys for the Zxid from the stat command to check_zookeeper.py

Improvement Resolved Trivial Fixed Alex Bame Alex Bame Alex Bame 05/Dec/17 13:48   12/Dec/17 14:31 12/Dec/17 13:45 3.5.3, 3.4.11, 3.6.0 3.5.4, 3.6.0, 3.4.12 scripts   0 3 3600 3600 0% Add keys for the zxid and its component pieces: epoch and transaction counter. These are not reported by the 'mntr' command so they must be obtained from 'stat'. The counter is useful for tracking transaction rates, and epoch is useful for tracking leader churn.

zk_zxid - the 64bit zxid from ZK
zk_zxid_counter - the lower 32 bits, AKA the counter
zk_zxid_epoch - the upper 32 bits, AKA the epoch
0% 0% 3600 3600 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 14 weeks, 2 days ago Add keys for the zxid and its component pieces: epoch and transaction counter. These are not reported by the 'mntr' command so they must be obtained from 'stat'. The counter is useful for tracking transaction rates, and epoch is useful for tracking leader churn.

zk_zxid - the 64bit zxid from ZK
zk_zxid_counter - the lower 32 bits, AKA the counter
zk_zxid_epoch - the upper 32 bits, AKA the epoch
https://github.com/apache/zookeeper/pull/425 0|i3nkbb:
ZooKeeper ZOOKEEPER-2949

SSL ServerName not set when using hostname, some proxies may failed to proxy the request.

Bug Resolved Major Fixed Unassigned Feng Shaobao Feng Shaobao 26/Nov/17 22:26   30/Jan/18 16:26 30/Jan/18 15:58 3.5.3 3.5.4, 3.6.0 java client   0 4 43200 43200 0% In our environment, the zk clusters are all behind a proxy, the proxy decide to transfer the request from client based on the "ServerName" field in SSL Hello packet(the proxy served on SSL only). but the Hello packets that zk client sended do proxy do not contain the "ServerName" field in it. after inspect the codes, we have found that it is because that zk client did not specify the peerHost when initializing the SSLContext. In our environment, the zk clusters are all behind a proxy, the proxy decide to transfer the request from client based on the "ServerName" field in SSL Hello packet(the proxy served on SSL only). but the Hello packets that zk client sended do proxy do not contain the "ServerName" field in it. after inspect the codes, we have found that it is because that zk client did not specify the peerHost when initializing the SSLContext.

In the method initSSL of class ZKClientPipelineFactory, it initialize the SSLEngine like below:

sslEngine = sslContext.createSSLEngine();

Actually the sslContext provide another factory method that receives the hostName and port parameter.

public final SSLEngine createSSLEngine(String hostName, int port)

If we call this method to create the SSLEngine, then the proxy will know which zk cluster it really want to access.
0% 0% 43200 43200 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 7 weeks, 2 days ago ssl proxy 0|i3n78v:
ZooKeeper ZOOKEEPER-2948

Failing c unit tests on apache jenkins

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 21/Nov/17 15:44   26/Mar/18 14:25 22/Nov/17 12:46 3.5.3, 3.6.0 3.5.4, 3.6.0, 3.4.12     0 5   Looks like someone is creating our test files outside of jenkins. I modified the job to output our id and look at the perms on those files:

----
[ZooKeeper-trunk] $ /bin/bash /tmp/jenkins291402182647699851.sh
uid=910(jenkins) gid=910(jenkins) groups=910(jenkins),999(docker)

drwxr-xr-x 3 10025 12036 4096 Nov 10 01:39 /tmp/zkdata
-rw-r--r-- 1 10025 12036 2 Nov 10 01:39 /tmp/zkdata/myid

/tmp/zkdata/version-2:
total 20
drwxr-xr-x 2 10025 12036 4096 Oct 22 23:35 .
drwxr-xr-x 3 10025 12036 4096 Nov 10 01:39 ..
-rw-r--r-- 1 10025 12036 1 Oct 22 23:35 acceptedEpoch
-rw-r--r-- 1 10025 12036 1 Oct 22 23:35 currentEpoch
-rw-r--r-- 1 10025 12036 562 Oct 22 23:35 snapshot.0
----

Notice that it's not jenkins.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 1 day ago 0|i3n2iv:
ZooKeeper ZOOKEEPER-2947

Find a reasonable way to handle sequential node counter overflowing

Bug Open Major Unresolved Unassigned Abraham Fine Abraham Fine 21/Nov/17 14:48   21/Nov/17 15:03   3.5.3, 3.4.11, 3.6.0       0 1   This is a follow on for ZOOKEEPER-2944. We should raise some sort of error when the counter for sequential nodes overflows, rather than silently overflowing and creating nodes with a negative counter. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 2 days ago 0|i3n2gn:
ZooKeeper ZOOKEEPER-2946

The truncate() function in FileTxnLog.java may fail to properly remove an uncommitted write resulting in data inconsistency

Bug Open Major Unresolved Unassigned Beom Heyn Kim Beom Heyn Kim 21/Nov/17 05:06   27/Nov/17 14:29   3.4.11, 3.4.12       1 2   The truncate() function in FileTxnLog.java may fail to properly remove the uncommitted write. This happens when the follower that has the uncommitted writes tries to resync with the leader after a few epochs have past. The failure results in data inconsistency in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency.

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0, 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 2
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 0
b. Async setData to the node 2 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Restart the node 1 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 2
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 1003
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync3 is inconsistent across nodes.

What seems to happen:
# At the step 7, setData (at zxid 0x400000001) writing value 1003 is committed on the node 2.
{panel:title=Log from the node 2:}
...
2017-11-16 03:08:14,123 [myid:2] - DEBUG [ProcessThread(sid:2 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - DEBUG [ProcessThread(sid:2 cport:-1)::Leader@787] - Proposing:: sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - INFO [SyncThread:2:FileTxnLog@209] - Creating new log file: log.400000001
2017-11-16 03:08:14,188 [myid:2] - DEBUG [SyncThread:2:Leader@600] - Count for zxid: 0x400000001 is 1
2017-11-16 03:08:15,752 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@512] - Shutting down
2017-11-16 03:08:15,753 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 2 ]
...
{panel}
# At the step 10, the node 2 is restarted and supposed to be properly resync’ed with the node 0 which is the leader.
a. The node 0 sends TRUNC msg so that the node 2 can truncate the setData at zxid 0x400000001..
b. On the other hand, the node 2 tries to truncate log to get in sync with the leader 0x200000001. However, the node 2 failed to properly truncate the setData at zxid 0x400000001. So, even if resync was finished, the value 1003 is still remained intact on the node 2 while other nodes have value 3 for the same key.
c. It seems on the node 2, there is only log.100000001 and log.400000001 but no log.200000001. This seems to cause failing to delete log.400000001 during truncate(). It looks like we will be considering log.400000001 by the time returning from the init() of FileTxnLog.java so that we will never execute ‘itr.logFile.delete()’ for the log.400000001.
d. Then, after returning from the truncate(), loadDatabase() will be invoked and log.400000001 will be read and the setData at zxid 0x400000001 gets loaded into the in-memory data tree.
{panel:title=Log from the node 2:}
...
2017-11-16 03:08:59,051 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 215
2017-11-16 03:08:59,052 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:QuorumPeer$QuorumServer@184] - Resolved hostname: 127.0.0.1 to address: /127.0.0.1
2017-11-16 03:08:59,125 [myid:2] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@349] - Truncating log to get in sync with the leader 0x200000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF excepton java.io.EOFException
2017-11-16 03:08:59,126 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileSnap@83] - Reading snapshot /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.200000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF excepton java.io.EOFException
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF excepton java.io.EOFException
2017-11-16 03:08:59,131 [myid:2] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@387] - Got zxid 0x500000001 expected 0x1
2017-11-16 03:08:59,132 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnSnapLog@248] - Snapshotting: 0x500000004 to /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.500000004
...
{panel}
{panel:title=Log from the node 0:}
...
2017-11-16 03:08:59,050 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@372] - LEADING - LEADER ELECTION TOOK - 222
2017-11-16 03:08:59,055 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@346] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@266e422
2017-11-16 03:08:59,124 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@401] - Synchronizing with Follower sid: 2 maxCommittedLog=0x500000004 minCommittedLog=0x100000001 peerLastZxid=0x400000001
2017-11-16 03:08:59,124 [myid:0] - DEBUG [LearnerHandler-/127.0.0.1:54482:LearnerHandler@415] - proposal size is 14
2017-11-16 03:08:59,124 [myid:0] - DEBUG [LearnerHandler-/127.0.0.1:54482:LearnerHandler@418] - Sending proposals to follower
2017-11-16 03:08:59,124 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@475] - Sending TRUNC
2017-11-16 03:08:59,147 [myid:0] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@215] - Accepted socket connection from /127.0.0.1:55118
2017-11-16 03:08:59,184 [myid:0] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@383] - Exception causing close of session 0x0: ZooKeeperServer not running
2017-11-16 03:08:59,184 [myid:0] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@386] - IOException stack trace
java.io.IOException: ZooKeeperServer not running
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:08:59,184 [myid:0] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:55118 (no session established for client)
2017-11-16 03:08:59,224 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@535] - Received NEWLEADER-ACK message from 2
2017-11-16 03:08:59,224 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@962] - Have quorum of supporters, sids: [ 0,2 ]; starting up and setting last processed zxid: 0x600000000
...
{panel}
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 16 weeks, 3 days ago 0|i3n1gn:
ZooKeeper ZOOKEEPER-2945

Synchronization code on the follower does not properly truncate uncommitted write resulting in data inconsistency

Bug Patch Available Major Unresolved Unassigned Beom Heyn Kim Beom Heyn Kim 20/Nov/17 03:23   27/Nov/17 14:23   3.4.11, 3.4.12       1 4   Synchronization code in the syncWithLeader function of Learner.java doesn't seems to truncate uncommitted writes properly when the follower receives SNAP msg from the leader. This results in data inconsistency in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
The value of key /testDivergenceResync2 is inconsistent across nodes -- node 1 has a new value that will never be replicated to the other nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x300000001) writing the value 1002 is committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.300000001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x300000001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader.
a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2.
b. On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x200000001.
c. However, this does not remove the setData at zxid 0x300000001 from the transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x300000001
2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x300000001 zxid of leader is 0x500000000sent zxid of db as 0x200000001
2017-11-16 03:02:37,701 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@535] - Received NEWLEADER-ACK message from 1
2017-11-16 03:02:37,702 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962] - Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x500000000
...
{panel}
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:37,473 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 218
2017-11-16 03:02:37,475 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184] - Resolved hostname:
127.0.0.1 to address: /127.0.0.1
2017-11-16 03:02:37,593 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxnFactory@215] - Accepted socket connection from /127.0.0.1:57338
2017-11-16 03:02:37,626 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@336] - Getting a snapshot from leader 0x200000001
2017-11-16 03:02:37,627 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@383] - Exception causing close of
session 0x0: ZooKeeperServer not running
2017-11-16 03:02:37,627 [myid:1] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@386] - IOException stack trace
java.io.IOException: ZooKeeperServer not running
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:02:37,627 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@1040] - Closed socket connection f
or client /127.0.0.1:57338 (no session established for client)
2017-11-16 03:02:37,629 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@248] - Snapshotting: 0x200000001 to /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/1/version-2/snapshot.200000001
...
{panel}
# At the step 10, the node 1 is restarted again and supposed to be properly resync’ed with the node 2 which is the leader again.
a. When the node 1 is restarted, it restores its in-memory data tree from the snapshot at zxid 0x200000001 and replay setData at zxid 0x300000001 (which actually needed to be truncated)
b. However, the node 2 just sends DIFF containing setData written at 9th path, and no truncation will be occurred.
c. As a result, the node 1 still has the value 1002 while other nodes will have the value 2 for the same key
{panel}
{panel:title=Log from the node 2:}

2017-11-16 03:03:21,033 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 217
2017-11-16 03:03:21,038 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46967:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1e1cf18c
2017-11-16 03:03:21,103 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46967:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x500000004 minCommittedLog=0x500000001 peerLastZxid=0x500000003
2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@415] - proposal size is 4
2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@418] - Sending proposals to follower
2017-11-16 03:03:21,103 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46967:LearnerHandler@475] - Sending DIFF
2017-11-16 03:03:21,156 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxnFactory@215] - Accepted socket connection from /127.0.0.1:49611
2017-11-16 03:03:21,178 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46967:LearnerHandler@535] - Received NEWLEADER-ACK message from 1
2017-11-16 03:03:21,178 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962] - Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x600000000
2017-11-16 03:03:21,196 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@383] - Exception causing close of session 0x0: ZooKeeperServer not running
2017-11-16 03:03:21,196 [myid:2] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@386] - IOException stack trace
java.io.IOException: ZooKeeperServer not running
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:03:21,196 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:49611 (no session established for client)
2017-11-16 03:03:21,237 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:Leader@579] - outstanding is 0
...
{panel}
{panel}
{panel:title=Log from the node 1:}

2017-11-16 03:03:21,034 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 222
2017-11-16 03:03:21,035 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184] - Resolved hostname: 127.0.0.1 to address: /127.0.0.1
2017-11-16 03:03:21,104 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@332] - Getting a diff from the leader 0x500000004
2017-11-16 03:03:21,105 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@387] - Got zxid 0x500000004 expected 0x1
2017-11-16 03:03:21,189 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.500000004
2017-11-16 03:03:21,189 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:CommitProcessor@164] - Committing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004 txntype:5 reqpath:n/a
2017-11-16 03:03:21,189 [myid:1] - DEBUG [CommitProcessor:1:FinalRequestProcessor@89] - Processing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004 txntype:5 reqpath:n/a
...
{panel}
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 16 weeks, 3 days ago 0|i3mzkf:
ZooKeeper ZOOKEEPER-2944

Specify correct overflow value

Bug Resolved Trivial Fixed Unassigned Chris Donati Chris Donati 17/Nov/17 13:46   21/Nov/17 14:49 21/Nov/17 12:50   3.5.4, 3.6.0, 3.4.12 documentation   0 4   When a sequence counter exceeds 2147483647, the next value is -2147483648.

https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Sequence+Nodes+--+Unique+Naming
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 2 days ago 0|i3mxtr:
ZooKeeper ZOOKEEPER-2943

Need dynamic server host resolve in client

Improvement Open Major Unresolved Unassigned Liu Sixian Liu Sixian 16/Nov/17 21:44   16/Nov/17 21:44   3.4.10   java client   0 2   Zookeeper Java client provides the StaticHostProvider to hold server addresses. The host name resolving only happen once when the StaticHostProvider instanced. In some situation, we use host name instead of IP addresses to smooth the changes of backend server movement. But client can't have the opportunity to resolve host name again. How can I do? 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 6 days ago 0|i3mwjr:
ZooKeeper ZOOKEEPER-2942

All 3.4.11 documentation links on site return 404

Bug Resolved Major Cannot Reproduce Unassigned Jack Foy Jack Foy 15/Nov/17 16:47   05/Jan/18 18:08 05/Jan/18 18:08 3.4.11   documentation   0 1   n/a http://zookeeper.apache.org/doc/r3.4.11 and links under that location all return 404, including the Release Notes link on http://zookeeper.apache.org/releases.html under "9 November, 2017: release 3.4.11 available". 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 10 weeks, 6 days ago 0|i3mtyn:
ZooKeeper ZOOKEEPER-2941

support client FIFO client order, not only channel FIFO client order

New Feature Open Major Unresolved Unassigned chenzongzhi chenzongzhi 15/Nov/17 03:01   15/Nov/17 03:01           0 3   right now, zookeeper only promise channel FIFO order, in this case the third operation my arrive before the second operation:

since zookeeper promise that these operation are sending in pipeline, so the later operation don't need to wait the prior's confirmation. so the three operations
1. set a = 1
2. set b = 1
3. set ready = true

these three operations are sending in pipeline, the first operation set a = 1 is process ok, and the second operation set b = 1 is on the way. then there is something wrong with the leader, then the client connect a new tcp connection with the leader. And then the client send the last operation, since there is two tcp connection from client to server, even through the first is closed from the client's view, but there maybe still some redidual data, so we can't promise whether the second operation will arrive to the leader, and we also can't promise that the second operation arrive to the leader before the third one or after the third one. so this violate the client FIFO order.

we know that http://atomix.io/copycat/docs/client-interaction/#preserving-program-order provide client level FIFO order.
How about support client level FIFO order

Thank you
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 18 weeks, 1 day ago 0|i3mspb:
ZooKeeper ZOOKEEPER-2940

ZOOKEEPER-2933 Deal with maxbuffer as it relates to large requests from clients

Sub-task Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 09/Nov/17 07:12   20/May/19 13:50 11/Jul/18 19:41   3.6.0, 3.5.5 jute, server   0 2 0 3000   Monitor real-time Jute buffer usage as it relates to large requests from clients. 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 36 weeks, 1 day ago 0|i3ml6f:
ZooKeeper ZOOKEEPER-2939

ZOOKEEPER-2933 Deal with maxbuffer as it relates to proposals

Sub-task Resolved Major Fixed Andor Molnar Andor Molnar Andor Molnar 09/Nov/17 07:11   24/Apr/18 07:17 06/Feb/18 19:14   3.5.4, 3.6.0 jute, server   0 3   Monitor real-time Jute buffer usage as it relates to proposals. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 47 weeks, 2 days ago 0|i3ml67:
ZooKeeper ZOOKEEPER-2938

Server is unable to join quorum after connection broken to other peers

Bug Open Major Unresolved Unassigned Abhay Bothra Abhay Bothra 08/Nov/17 19:31   31/Oct/19 14:31   3.4.6       10 23   We see the following logs in the node with {{myid: 1}}
{code}
2017-11-08 15:06:28,375 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-11-08 15:06:28,375 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-11-08 15:07:28,375 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
2017-11-08 15:07:28,375 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-11-08 15:07:28,376 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-11-08 15:08:28,375 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
2017-11-08 15:08:28,376 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-11-08 15:08:28,376 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-11-08 15:09:28,376 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
2017-11-08 15:09:28,376 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-11-08 15:09:28,376 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-11-08 15:10:28,376 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
2017-11-08 15:10:28,376 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-11-08 15:10:28,377 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
{code}

On the nodes with {{myid: 2}} and {{myid: 3}}, we see connection broken events for {{myid: 1}}
{code}
2017-11-07 02:54:32,135 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1, my id = 2, error =
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:223)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2017-11-07 02:54:32,135 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2017-11-07 02:54:32,135 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2017-11-07 02:54:32,135 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
{code}

From the reported occurrences, it looks like this is a problem only when the node with the smallest {{myid}} loses connection.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
20 weeks ago 0|i3mkkv:
ZooKeeper ZOOKEEPER-2937

zookeeper issues with handling authentication...

Bug Patch Available Major Unresolved Unassigned Sriram Chandramouli Sriram Chandramouli 08/Nov/17 11:30   15/Nov/17 22:09   3.4.6 3.4.6 server   0 3   Linux <node_name> 2.6.32-696.6.3.el6.YAHOO.20170712.4.x86_64 #1 SMP Wed Jul 12 01:40:52 UTC 2017 x86_64

-bash-4.1$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.8 (Santiago)

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

/home/y/libexec/ant/bin/ant -version
Apache Ant(TM) version 1.9.0 compiled on March 5 2013
we have created an authentication provider plugin that can authenticate clients based on the cert that client is presenting. our zookeeper instance has been configured (and started) to authenticate and allow only certain appid's. this works as intended when clients (ours are c-clients) send an auth message via yca_add_auth containing the cert *and* the authentication provider is configured to allow it.

however, if the clients do *not* present one (i.e. do not send an auth packet), and if the authentication provider allows only certain appid's, this connection still goes through - i.e. clients are able to connect, create/watch nodes etc.! this is unexpected and does *not* allow us to prevent certain clients from connecting to a zookeeper quorum (as they can still connect without present any credentials).

it looks like zookeeper will only invoke the auth providers if it receives an auth packet from the client.

none of this block - https://github.com/sriramch/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1060

ever gets executed, and it directly jumps to this

https://github.com/sriramch/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1108

we have a usecase where we only want clients that can present valid credentials to connect to zookeeper (zk).

i was hoping to expose an interface where different auth providers (when they are loaded) would let zk know if they need to authenticate a client before processing other data packets. the default ones (kerberos/ip/digest etc.) would say no to maintain compatibility. our auth provider can be configured to say yes/no (default no) depending on use-case. zk before processing a data packet can look at the auth info in the server connection to see the schemes that requires authentication and have successfully authenticated. connection will succeed if all schemes that require authentication have successfully authenticated; else, we disable receive.

can someone please look into this issue and evaluate the proposal? i can work on creating a pr for this.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 18 weeks ago 0|i3mjvj:
ZooKeeper ZOOKEEPER-2936

Duplicate Keys in log4j.properties config files

Bug Resolved Trivial Fixed Unassigned Hari Sekhon Hari Sekhon 08/Nov/17 10:51   02/Mar/18 17:35 02/Mar/18 17:13 3.4.8, 3.5.3, 3.6.0 3.5.4, 3.6.0, 3.4.12 contrib, other   0 5   Apache ZooKeeper source tarball Recent versions of ZooKeeper have introduced the following duplicate keys in to the contrib log4j.properties files.

In this file:
{code}
./zookeeper-3.4.8/contrib/rest/conf/log4j.properties
{code}
and this file:
{code}
./zookeeper-3.4.8/src/contrib/rest/conf/log4j.properties
{code}

the following duplicate keys are found:

{code}
log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
{code}

This was discovered because I've written file validators for most major formats which recurse all my github repos and this was failing my integration tests when pulling ZooKeeper source code. I actually added --exclude and --ignore-duplicate-keys switches to {code}validate_ini.py{code} to work around this and fix my builds for now but just remembered to raise this to you guys.

The validator tools if you're interested can be found at:

https://github.com/harisekhon/pytools

Cheers

Hari
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 2 weeks, 6 days ago 0|i3mjtj:
ZooKeeper ZOOKEEPER-2935

ZOOKEEPER-2639 [QP MutualAuth]: Port ZOOKEEPER-1045 implementation from branch-3.5 to trunk

Sub-task Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 06/Nov/17 17:08   27/Nov/17 18:26 27/Nov/17 18:11   3.6.0 quorum, security   0 4   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 16 weeks, 3 days ago 0|i3mgtb:
ZooKeeper ZOOKEEPER-2934

c versions of election and queue recipes do not compile

Bug Resolved Major Fixed Andor Molnar Abraham Fine Abraham Fine 06/Nov/17 15:44   15/Nov/17 17:32 15/Nov/17 17:19 3.4.10, 3.5.3 3.5.4, 3.6.0 recipes   0 5   I see errors like:
{code}
/var/zookeeper/src/recipes/queue/src/c/../../../../../src/c/include/zookeeper_log.h:39:74: error: expected expression before ')' token
log_message(_cb, ZOO_LOG_LEVEL_DEBUG, __LINE__, __func__, __VA_ARGS__)
^
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 18 weeks, 1 day ago 0|i3mgpb:
ZooKeeper ZOOKEEPER-2933

Ability to monitor the jute.maxBuffer usage in real-time

New Feature Closed Major Fixed Andor Molnar Andor Molnar Andor Molnar 06/Nov/17 11:22   20/May/19 13:50 02/Oct/18 05:38   3.6.0, 3.5.5 jute, server   0 5   ZOOKEEPER-2939, ZOOKEEPER-2940, ZOOKEEPER-2979, ZOOKEEPER-2980, ZOOKEEPER-2981 This is related to jute.maxbuffer problems on the server side when Leader generates a proposal that doesn't fit into Follower's Jute buffer causing the quorum to be broken.

Proposed solution is to add the following new JMX Beans:

1. Add getJuteMaxBuffer to ZookeeperServerBean which monitors the current jute.maxbuffer setting,
2. Add get last/min/max ProposalSize to LeaderBean which monitors the size of the latest/min/max proposal.

The rationale behind this new feature is to add capability to JMX monitoring API to determine what is the current/min/max usage of the Jute buffer. This will let third party monitoring tools to get samples of buffer usage and create some statistics or generate alerts if it breaches a particular value.

This will not solve the problems related to jute.maxbuffer setting on its own, but it's intended to be the first step towards better handling or preventing production issues to happen.

Subtasks have been created to separately implement client and server side buffer size monitoring.
100% 5400 0 buffer, buffer-length 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 19 weeks ago 0|i3mg93:
ZooKeeper ZOOKEEPER-2932

Performance enhancement about purging task

Improvement Open Major Unresolved Unassigned OuYang Liang OuYang Liang 06/Nov/17 04:17   01/Feb/19 08:22   3.4.10, 3.5.3   server   0 3 86400 85200 1200 1% The method FileTxnLog.getLogFiles is used to find out the target log files to be retained base on the given zxid when purging task is running. The current implementation of this method is trivial to understand, and iterate the log files twice to achieve its purchase. It could be improved from both performance and readability. 1% 1% 1200 85200 86400 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 19 weeks, 1 day ago 0|i3mfmn:
ZooKeeper ZOOKEEPER-2931

WriteLock recipe: incorrect znode ordering when the sessionId is part of the znode name

Bug Resolved Major Fixed Unassigned Javier Cacheiro Javier Cacheiro 05/Nov/17 13:56   15/Nov/17 19:27 15/Nov/17 18:36 3.4.10, 3.5.3 3.5.4, 3.6.0, 3.4.12     0 4   When the nodes are sorted in WriteLock.java using a TreeSet the whole znode path is taken into account and not just the sequence number.

This causes an issue when the sessionId is included in the znode path because a znode with a lower sessionId will appear as lower than other znode with a higher sessionId even if its sequence number is bigger.

In specific situations this ended with two clients holding the lock at the same time.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 18 weeks, 1 day ago 0|i3mf73:
ZooKeeper ZOOKEEPER-2930

Leader cannot be elected due to network timeout of some members.

Bug Open Critical Unresolved Unassigned Jiafu Jiang Jiafu Jiang 03/Nov/17 04:16   21/Nov/18 19:50   3.4.10, 3.5.3, 3.4.11, 3.5.4, 3.4.12   leaderElection, quorum, server   1 10 0 3000   Java 8
ZooKeeper 3.4.11(from github)
Centos6.5
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the election(FastLeaderElection.lookForLeader()), it will send notifications to all the servers.
When it fails to receive any notification during a timeout, it will resend the notifications, and double the timeout. This process will repeat until any notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message into a queue and return. The WorkerSender is responsable to send the notifications.

The WorkerSender just process the notifications one by one by passing the notifications to QuorumCnxManager. Here comes the problem, the QuorumCnxManager.toSend() blocks for a long time when the notification is send to ofs_zk2(whose network is down) and some notifications (which belongs to ofs_zk1) will thus be blocked for a long time. The repeated notifications by FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, ByteBuffer b) {
/*
* If sending message to myself, then simply enqueue it (loopback).
*/
if (this.mySid == sid) {
b.position(0);
addToRecvQueue(new Message(b.duplicate(), sid));
/*
* Otherwise send to the corresponding thread to send.
*/
} else {
/*
* Start a new connection if doesn't have one already.
*/
ArrayBlockingQueue<ByteBuffer> bq = new ArrayBlockingQueue<ByteBuffer>(SEND_CAPACITY);
ArrayBlockingQueue<ByteBuffer> bqExisting = queueSendMap.putIfAbsent(sid, bq);
if (bqExisting != null) {
addToSendQueue(bqExisting, b);
} else {
addToSendQueue(bq, b);
}

// This may block!!!
connectOne(sid);

}
}
{code}

Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the epoch ack, but in fact the ofs_zk1 does not receive the notification(which says the leader is ofs_zk3) because the ofs_zk3 has not sent the notification(which may still exist in the sendqueue of WorkerSender). At last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, so it quits the leader and begins a new election.

The log files of ofs_zk1 and ofs_zk3 are attached.
100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 3 9223372036854775807
1 year, 17 weeks ago 0|i3mcnj:
ZooKeeper ZOOKEEPER-2929

[DNS Support] Zookeeper client still trying to establish the connection with old IP even after Zookeeper Server restarted with new IP when domain name configured at the client side

Bug Open Major Unresolved Unassigned Pavan Pavan 02/Nov/17 10:20   04/Oct/19 10:55   3.4.9, 3.4.10, 3.5.1, 3.5.3   java client   0 2   1. Zookeeper server deployed as a docker container in Kubernetes
2. In the Java Client configured zookeeper 'domainname' for the server address
3. Once we restart the Zookeeper 'POD', the Zookeeper container starting with new IP
4. During this time the Zookeeper client able to resolve the new ip and making the connection But it is also keep trying to connect to old IP also. The connection status in netstat is coming as
SYNC_SENT and Connection getting closed

Note: Already applied https://issues.apache.org/jira/browse/ZOOKEEPER-2184 patch

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch, Important
1 year, 28 weeks, 4 days ago 0|i3mbb3:
ZooKeeper ZOOKEEPER-2928

pthread_join hang at zookeeper_close

Bug Open Critical Unresolved Unassigned xiaomingzhongguo xiaomingzhongguo 30/Oct/17 03:59   02/Apr/18 17:11   3.4.6   c client   0 2   when call zookeeper_close
thread hang at pthread_join , do_io thread not exist , and do_completion not exit

#0 0x00002b8e38b6b725 in pthread_join () from /lib64/libpthread.so.0
#1 0x0000000000cc6b86 in adaptor_finish (zh=0x2aaaaae05240) at src/mt_adaptor.c:285
#2 0x0000000000cc21f3 in zookeeper_close (zh=0x2aaaaae05240) at src/zookeeper.c:2493
#3 0x00000000008eeb04 in ZkAPI::ZkClose ()
#4 0x00000000009270b1 in AgentInfo::zkCloseConnection ()
#5 0x0000000000929e02 in AgentInfo::timeSyncHandler ()
#6 0x00000000010f0585 in event_base_loop (base=0x1679d00, flags=0) at event.c:1350
#7 0x0000000000924f31 in AgentInfo::run ()
#8 0x00000000008998bf in gseThread::run_helper ()
#9 0x0000000000922956 in tos::util_thread_start ()
#10 0x00002b8e38b6a193 in start_thread () from /lib64/libpthread.so.0
#11 0x00002b8e3929ff0d in clone () from /lib64/libc.so.6

#0 0x00002b8e38b6e326 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000cc70be in do_completion (v=0x2aaaaae05240) at src/mt_adaptor.c:463
#2 0x00002b8e38b6a193 in start_thread () from /lib64/libpthread.so.0
#3 0x00002b8e3929ff0d in clone () from /lib64/libc.so.6
#4 0x0000000000000000 in ?? ()
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 50 weeks, 3 days ago 0|i3lutj:
ZooKeeper ZOOKEEPER-2927

Local session reconnect validation not forward to leader

Improvement Open Minor Unresolved Unassigned Qihong Xu Qihong Xu 28/Oct/17 05:48   28/Oct/17 22:33   3.5.3   java client, quorum, server   0 1   configuration management system based on zookeeper 3.5.3 When zookeeper quorum recovers from shutdown/crash, a client with a local session will reconnect to a random server in quorum. If this random-chosen server is not leader and does not own the local session previously, it will forward this session to leader for validation. And then if this is a global session, leader will update its owner, if not, leader adds Boolean false to packet and does nothing.

Since our system involves mostly local session and has a large amount of connections, this procedure may be redundant and add potential pressure to leader. Is this reasonable for the reconnect scenario that local session validation not forward to leader, instead return by follower directly?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 20 weeks, 5 days ago 0|i3ltrr:
ZooKeeper ZOOKEEPER-2926

Data inconsistency issue due to the flaw in the session management

Bug Resolved Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 27/Oct/17 19:23   08/Aug/18 03:22 08/Aug/18 00:21 3.5.3, 3.6.0 3.6.0 server   0 7 0 11400   The local session upgrading feature will upgrade the session locally before receving a quorum commit of creating global session. It's possible that the server shutdown before the creating session request being sent to leader, if we retained the ZKDatabase or there is Snapshot happened just before shutdown, then only this server will have the global session.

If that server didn't become leader, then it will have more global sessions than others, and those global sessions won't expire as the leader doesn't know it's existence. If the server became leader, it will accept the client renew session request and the client is allowed to create ephemeral nodes, which means other servers only have ephemeral nodes but not that global session. If there is follower going to have SNAP sync with it, then it will also have the global session. If the server without that global session becomes new leader, it will check and delete those dangling ephemeral node before serving traffic. These could introduce the issues that the ephemeral node being exist on some servers but not others.

There is dangling global session issue even without local session feature, because on leader it will update the ZKDatabase when processing ConnectionRequest and in the PrepRequestProcessor before it's quorum committed, which also has this risk.
100% 100% 11400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 32 weeks, 1 day ago 0|i3ltdr:
ZooKeeper ZOOKEEPER-2925

ZooKeeper server fails to start on first-startup due to race to create dataDir & snapDir

Bug Patch Available Major Unresolved Unassigned Robert P. Thille Robert P. Thille 27/Oct/17 18:16   04/Oct/19 10:55   3.4.6 3.4.10 other   1 7   Due to two threads trying to create the dataDir and snapDir, and the java.io.File.mkdirs() call returning false both for errors and for the directory already existing, sometimes ZooKeeper will fail to start with the following stack trace:

{noformat}
2017-10-25 22:30:40,069 [myid:] - INFO [main:ZooKeeperServerMain@95] - Starting server
2017-10-25 22:30:40,075 [myid:] - INFO [main:Environment@100] - Server environment:zookeeper.version=3.4.6-mdavis8efb625--1, built on 10/25/2017 01:12 GMT

[ More 'Server environment:blah blah blah' messages trimmed]

2017-10-25 22:30:40,077 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/
2017-10-25 22:30:40,081 [myid:] - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.IOException: Unable to create data directory /bp2/data/version-2
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:85)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:104)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-10-25 22:30:40,085 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
{noformat}

this is caused by the QuorumPeerMain thread and the PurgeTask thread both competing to create the directories.
easyfix, newbie, patch 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 17 weeks ago 0|i3ltaf:
ZooKeeper ZOOKEEPER-2924

Flaky Test: org.apache.zookeeper.test.LoadFromLogTest.testRestoreWithTransactionErrors

Bug Resolved Major Fixed Andor Molnar Andor Molnar Andor Molnar 26/Oct/17 10:31   12/Dec/17 13:16 12/Dec/17 13:01 3.4.10, 3.5.3, 3.6.0 3.5.4, 3.6.0, 3.4.12 server, tests   0 5   From https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1682/

Same issue happens in jdk8 and jdk9 builds as well.

Issue has already been fixed by https://issues.apache.org/jira/browse/ZOOKEEPER-2484 , but I believe that the root cause here is that test startup / cleanup code is included in the tests instead of using try-finally block or Before-After methods.

As a consequence, when exception happens during test execution, ZK test server doesn't get shutdown properly and still listening on the port bound to the test class.

As mentioned above there could be 2 approaches to address this:
#1 Wrap cleanup code block with finally
#2 Use JUnit's Before-After methods for initialization and cleanup

Test where original issue happens:

{noformat}
...
[junit] 2017-10-12 15:05:20,135 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x104cd7b190c0000 type:create cxid:0x8c zxid:0x8d txntype:-1 req$
[junit] 2017-10-12 15:05:20,137 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x104cd7b190c0000 type:create cxid:0x8d zxid:0x8e txntype:-1 req$
[junit] 2017-10-12 15:05:20,139 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x104cd7b190c0000 type:create cxid:0x8e zxid:0x8f txntype:-1 req$
[junit] 2017-10-12 15:05:20,142 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x104cd7b190c0000 type:create cxid:0x8f zxid:0x90 txntype:-1 req$
[junit] 2017-10-12 15:05:20,144 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x104cd7b190c0000 type:create cxid:0x90 zxid:0x91 txntype:-1 req$
[junit] 2017-10-12 15:05:30,479 [myid:] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x104cd7b190c0000, timeout of 6000ms exceeded
[junit] 2017-10-12 15:05:32,996 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x104cd7b190c0000 type:ping cxid:0xfffffffffffffffe zxid:0xfffff$
[junit] 2017-10-12 15:05:24,147 [myid:] - WARN [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1111] - Client session timed out, have not heard from server in 4002ms for sessionid 0x104cd7b190c0000
[junit] 2017-10-12 15:05:32,996 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1159] - Client session timed out, have not heard from server in 4002ms for sessionid 0x104cd7b190c0000, closing socket connectio$
[junit] 2017-10-12 15:05:21,479 [myid:] - INFO [SessionTracker:SessionTrackerImpl@163] - SessionTrackerImpl exited loop!
[junit] 2017-10-12 15:05:32,998 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x104cd7b190c0000, likely client has closed socket
[junit] 2017-10-12 15:05:33,067 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:45735 which had sessionid 0x104cd7b190c0000
[junit] 2017-10-12 15:05:32,996 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x104cd7b190c0000
[junit] 2017-10-12 15:05:33,889 [myid:] - INFO [main:ZooKeeper@687] - Session: 0x104cd7b190c0000 closed
[junit] 2017-10-12 15:05:33,890 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down for session: 0x104cd7b190c0000
[junit] 2017-10-12 15:05:33,891 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@74] - TEST METHOD FAILED testRestoreWithTransactionErrors
[junit] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /invaliddir/test-
[junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
[junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
[junit] at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:786)
[junit] at org.apache.zookeeper.test.LoadFromLogTest.testRestoreWithTransactionErrors(LoadFromLogTest.java:368)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit] at java.lang.reflect.Method.invoke(Method.java:606)
[junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
[junit] at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)
[junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
[junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
[junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
[junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
[junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
[junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
[junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
[junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
[junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
{noformat}

Test #2 where port is already in use:

{noformat}
[junit] 2017-10-12 15:05:33,899 [myid:] - INFO [main:ZKTestCase$1@59] - STARTING testReloadSnapshotWithMissingParent
[junit] 2017-10-12 15:05:33,899 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@53] - RUNNING TEST METHOD testReloadSnapshotWithMissingParent
[junit] 2017-10-12 15:05:33,900 [myid:] - INFO [main:ZooKeeperServer@173] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 60000 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/$
[junit] 2017-10-12 15:05:33,900 [myid:] - INFO [main:ServerCnxnFactory@117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
[junit] 2017-10-12 15:05:33,900 [myid:] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2017-10-12 15:05:33,901 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@74] - TEST METHOD FAILED testReloadSnapshotWithMissingParent
[junit] java.net.BindException: Address already in use
[junit] at sun.nio.ch.Net.bind0(Native Method)
[junit] at sun.nio.ch.Net.bind(Net.java:463)
[junit] at sun.nio.ch.Net.bind(Net.java:455)
[junit] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
[junit] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
[junit] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
[junit] at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
[junit] at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:137)
[junit] at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:130)
[junit] at org.apache.zookeeper.test.LoadFromLogTest.testReloadSnapshotWithMissingParent(LoadFromLogTest.java:412)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit] at java.lang.reflect.Method.invoke(Method.java:606)
[junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
[junit] at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)
[junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
[junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
[junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
[junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
[junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
[junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
[junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
[junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
[junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
{noformat}
flaky, flaky-test 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 14 weeks, 2 days ago
Reviewed
0|i3lqq7:
ZooKeeper ZOOKEEPER-2923

The comment of the variable matchSyncs in class CommitProcessor has a mistake.

Bug Resolved Minor Fixed Jiafu Jiang Jiafu Jiang Jiafu Jiang 22/Oct/17 22:25   15/Nov/17 17:32 15/Nov/17 16:48 3.4.10, 3.5.3 3.5.4, 3.6.0, 3.4.12 quorum   0 4   The comment of the variable matchSyncs in class CommitProcessor says:


{code:java}
/**
* This flag indicates whether we need to wait for a response to come back from the
* leader or we just let the sync operation flow through like a read. The flag will
* be true if the CommitProcessor is in a Leader pipeline.
*/
boolean matchSyncs;
{code}

I search the source code and find that matchSyncs will be false if the CommitProcessor is in a Leader pipeline, and it will be true if the CommitProcessor is in a Follower pipeline.
Therefore I think the comment should be modified to match the code.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 18 weeks, 1 day ago
Reviewed
0|i3lkfj:
ZooKeeper ZOOKEEPER-2922

Flaky Test: org.apache.zookeeper.test.LoadFromLogTest

Test Resolved Major Won't Fix Abraham Fine Andor Molnar Andor Molnar 20/Oct/17 04:56   26/Mar/18 15:21 26/Mar/18 15:21     server, tests   0 2   Backport changes of flaky test fix to branch-3.4 :

https://issues.apache.org/jira/browse/ZOOKEEPER-2484
flaky, flaky-test 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 51 weeks, 3 days ago 0|i3li93:
ZooKeeper ZOOKEEPER-2921

fsyncWarningThresholdMS is applied on each getChannel().force() - also needed on entire commit

Improvement Open Minor Unresolved Unassigned Jordan Zimmerman Jordan Zimmerman 19/Oct/17 06:09   21/Oct/17 11:34   3.5.3   server   0 2   FileTxnLog.commit() has a warning when an individual sync takes longer than {{fsyncWarningThresholdMS}}. However, it would also be useful to warn when the entire commit operation takes longer than {{fsyncWarningThresholdMS}} as this can cause client connection failures. Currently, commit() can take longer than 2/3 of a session but still not log a warning. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 21 weeks, 5 days ago 0|i3lggf:
ZooKeeper ZOOKEEPER-2920

Upgrade OWASP Dependency Check to 3.2.1

Bug Closed Major Fixed Patrick D. Hunt Abraham Fine Abraham Fine 17/Oct/17 17:49   23/Jan/20 13:17 30/May/18 23:58 3.5.4, 3.6.0, 3.4.12 3.6.0, 3.4.13, 3.5.5 build   0 6 0 4800   100% 100% 4800 0 newbie, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 42 weeks ago 0|i3lduf:
ZooKeeper ZOOKEEPER-2919

expired ephemeral node reappears after ZK leader change

Bug Open Major Unresolved Michael Han Jun Rao Jun Rao 16/Oct/17 18:58   03/Sep/19 14:37   3.4.9       0 7   We found the following issue when using ZK. A client (a Kafka broker) registered an ephemeral node in ZK. The client then received a session expiration event and created the new session. The client tried to create the same ephemeral node in ZK in the new session but received a NodeExistException. The following are the details.

From Kafka broker 1:
Broker 1 received the expiration of session 55bcff0f02d0002 at 13:33:26.

{code:java}
[2017-07-29 13:33:26,706] INFO Unable to reconnect to ZooKeeper service, session 0x55bcff0f02d0002 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn)
{code}

It then established a new session 55d8f690ca20038 at 13:33:33.

{code:java}
[2017-07-29 13:33:33,405] INFO Session establishment complete on server rdalnydbbdqs10/10.122.104.12:2181, sessionid = 0x55d8f690ca20038, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
{code}

However, the re-registration of the broker id fails.

{code:java}
[2017-07-29 13:33:33,408] INFO Result of znode creation is: NODEEXISTS (kafka.utils.ZKCheckedEphemeral)
[2017-07-29 13:33:33,408] ERROR Error handling event ZkEvent[New session event sent to kafka.server.KafkaHealthcheck$SessionExpireListener@74ad6d14] (org.I0Itec.zkclient.ZkEvent
Thread)
java.lang.RuntimeException: A broker is already registered on the path /brokers/ids/1. This probably indicates that you either have configured a brokerid that is already in use, or else you have shutdown this broker and restarted it faster than the zookeeper timeout so it appears to be re-registering.
at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:375)
at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:361)
at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:71)
at kafka.server.KafkaHealthcheck$SessionExpireListener.handleNewSession(KafkaHealthcheck.scala:105)
at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:736)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72)
{code}

From ZK server (my id 4) :
It expired the old session 55bcff0f02d0002 correctly before broker received the session expiration. It then went to ZK leader election soon after.

{code:java}
[2017-07-29 13:33:26,000] INFO Expiring session 0x55bcff0f02d0002, timeout of 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2017-07-29 13:33:26,019] INFO Processed session termination for sessionid: 0x55bcff0f02d0002 (org.apache.zookeeper.server.PrepRequestProcessor)
[2017-07-29 13:33:33,582] INFO Shutting down (org.apache.zookeeper.server.quorum.CommitProcessor)
[2017-07-29 13:33:34,344] INFO New election. My id = 4, proposed zxid=0x5830d1163b (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2017-07-29 13:34:22,499] INFO FOLLOWING - LEADER ELECTION TOOK - 48915 (org.apache.zookeeper.server.quorum.Learner)
{code}

From ZK server (my id 5):
It lost the connection to the old session 55bcff0f02d0002 before the session got expired. It then went into ZK leader election and became the leader. However, it didn't think the old session 55bcff0f02d0002 was expired after becoming the leader. Therefore, the new session 55d8f690ca20038 failed to create /brokers/ids/1. Only after that, it eventually expired the old session 55bcff0f02d0002.

{code:java}
[2017-07-29 13:33:24,216] WARN caught end of stream exception (org.apache.zookeeper.server.NIOServerCnxn)
EndOfStreamException: Unable to read additional data from client sessionid 0x55bcff0f02d0002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2017-07-29 13:33:24,216] INFO Closed socket connection for client /10.122.73.147:59615 which had sessionid 0x55bcff0f02d0002 (org.apache.zookeeper.server.NIOServerCnxn)
[2017-07-29 13:33:30,921] INFO New election. My id = 5, proposed zxid=0x5830d1113f (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2017-07-29 13:33:31,126] INFO LEADING - LEADER ELECTION TOOK - 1122 (org.apache.zookeeper.server.quorum.Leader)
[2017-07-29 13:33:33,405] INFO Established session 0x55d8f690ca20038 with negotiated timeout 6000 for client /10.122.73.147:47106 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-07-29 13:33:33,407] INFO Got user-level KeeperException when processing sessionid:0x55d8f690ca20038 type:create cxid:0x5 zxid:0x5900000352 txntype:-1 reqpath:n/a Error Path:/brokers/ids/1 Error:KeeperErrorCode = NodeExists for /brokers/ids/1 (org.apache.zookeeper.server.PrepRequestProcessor)
[2017-07-29 13:33:40,002] INFO Expiring session 0x55bcff0f02d0002, timeout of 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2017-07-29 13:33:40,074] INFO Processed session termination for sessionid: 0x55bcff0f02d0002 (org.apache.zookeeper.server.PrepRequestProcessor)
{code}

According to http://mail-archives.apache.org/mod_mbox/zookeeper-user/201701.mbox/%3CB512F6DE-C0BF-45CE-8102-6F242988268E%40apache.org%3E from [~fpj], a ZK client in a new session shouldn't see the ephemeral node created in its previous session. So, could this be a potential bug in ZK during ZK leader transition?
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
28 weeks, 2 days ago 0|i3lbwv:
ZooKeeper ZOOKEEPER-2918

Make ivy dependency report available as a build artifact from jenkins runs

Improvement Open Major Unresolved Abraham Fine Abraham Fine Abraham Fine 09/Oct/17 17:57   09/Oct/17 17:57           0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 23 weeks, 3 days ago 0|i3l1vb:
ZooKeeper ZOOKEEPER-2917

c client doesn't wait for server response upon closing handle, generates EndOfStreamException and CancelledKeyExceptions warnings

Bug Open Major Unresolved Unassigned Steven Raspudic Steven Raspudic 08/Oct/17 16:12   08/Oct/17 16:12   3.4.8       0 2   Basically seeing the same issue as documtented in

https://github.com/zk-ruby/zookeeper/pull/54#issuecomment-28764204

e.g.

[myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x14252332fe501c9, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:724)
2013-11-19 01:57:53,625 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x14252332fe501c9
2013-11-19 01:57:53,626 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:58253 which had sessionid 0x
14252332fe501c9

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 23 weeks, 4 days ago 0|i3l0dr:
ZooKeeper ZOOKEEPER-2916

ZOOKEEPER-3170 startSingleServerTest may be flaky

Sub-task Open Major Unresolved Bogdan Kanivets Patrick D. Hunt Patrick D. Hunt 07/Oct/17 17:33   22/Nov/18 07:05   3.5.3, 3.6.0   tests   0 4   startSingleServerTest seems to be failing intermittently. 10 times in the first few days of this month. Can someone take a look?
flaky, newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks ago 0|i3l03j:
ZooKeeper ZOOKEEPER-2915

Use "strict" conflict management in ivy

Improvement Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 06/Oct/17 17:38   04/Dec/17 14:29 01/Dec/17 18:31 3.4.11, 3.5.4, 3.6.0 3.4.11, 3.5.4, 3.6.0     0 3   Currently it is very difficult to tell exactly which dependencies make it into the final classpath of zookeeper. We do not perform any conflict resolution between the test and default classpaths (this has resulted in strange behavior with the slf4j-log4j12 binding) and have no way of telling if a change to the dependencies has altered the transitive dependencies pulled down by the project.

Our dependency list is relatively small so we should use "strict" conflict management (break the build when we try to pull two versions of the same dependency) so we can exercise maximum control over the classpath.

Note: I also attempted to find a way to see if I could always prefer transitive dependencies from the default configuration over those pulled by the test configuration (to make sure that the zookeeper we test against has the same dependencies as the one we ship) but this appears to be impossible (or at least incredibly difficult) with ivy. Any opinions here would be greatly appreciated.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 15 weeks, 3 days ago
Reviewed
0|i3kzfr:
ZooKeeper ZOOKEEPER-2914

compiler warning using java 9

Bug Resolved Minor Fixed Andor Molnar Patrick D. Hunt Patrick D. Hunt 04/Oct/17 13:41   08/Oct/17 08:24 07/Oct/17 15:25 3.4.11, 3.5.4, 3.6.0 3.4.11, 3.5.4, 3.6.0 build   0 3   There are a number of warnings that crop up on branch 3.4/3.5/trunk when compiling "ant clean compile-test" using java 9.

Perhaps someone can verify/fix any warning across the jdks that we support (jdk6/7/8/9) and 9 in particular since it's just reached GA.
newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 23 weeks, 4 days ago 0|i3kvm7:
ZooKeeper ZOOKEEPER-2913

testEphemeralNodeDeletion is flaky

Bug Closed Major Fixed maoling Patrick D. Hunt Patrick D. Hunt 02/Oct/17 10:57   20/May/19 13:50 05/Sep/18 10:16 3.4.10, 3.5.3, 3.4.11, 3.6.0 3.6.0, 3.5.5 tests   0 5 0 5400   testEphemeralNodeDeletion is showing up as flakey across a number of jobs.

1.https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
2.https://builds.apache.org/job/ZooKeeper_branch34_java9/305/testReport/junit/org.apache.zookeeper.server.quorum/EphemeralNodeDeletionTest/testEphemeralNodeDeletion/

After session close ephemeral node must be deleted
100% 100% 5400 0 flaky, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 28 weeks, 1 day ago 0|i3ks73:
ZooKeeper ZOOKEEPER-2912

Allow arbiter zookeeper nodes to be deployed

New Feature Open Major Unresolved Unassigned Simon Cooper Simon Cooper 02/Oct/17 07:18   02/Oct/17 07:18       leaderElection, quorum, server   1 2   In our system, we're having to deploy a single zookeeper cluster across multiple datacentres. In this situation, we're running into problems with latency across the sites.

One thing that would help is if there was the capability to deploy an arbiter zookeeper node that did not store/update data or serve client requests, could not become leader, and did not determine quorum for updates, but participated in leadership elections (very similar to arbiters for mongo, https://docs.mongodb.com/manual/tutorial/add-replica-set-arbiter/).

This arbiter could then be deployed on a separate arbiter site that did not need a fast network link to the rest of the cluster, but would determine the active cluster in split-brain situations across the 2 main sites.

Currently, there's nothing stopping a zookeeper deployed on the arbiter site from becoming leader, and then the relatively high latencies involved cause problems across the cluster. Observers don't really fit our use case at the moment either.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 24 weeks, 3 days ago 0|i3krsf:
ZooKeeper ZOOKEEPER-2911

Precommit checks are failing because of issuer pertaining to repo

Bug Open Major Unresolved Unassigned Nikhil Bhide Nikhil Bhide 01/Oct/17 11:56   01/Oct/17 11:56       build-infrastructure   0 1   error: some local refs could not be updated; try running
'git remote prune git://github.com/apache/zookeeper.git' to remove any old, conflicting branches

at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1924)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1643)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:71)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:352)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
at hudson.remoting.UserRequest.perform(UserRequest.java:181)
at hudson.remoting.UserRequest.perform(UserRequest.java:52)
at hudson.remoting.Request$2.run(Request.java:336)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
at ......remote call to H0(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1554)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:281)
at hudson.remoting.Channel.call(Channel.java:839)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
at sun.reflect.GeneratedMethodAccessor748.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
at com.sun.proxy.$Proxy109.execute(Unknown Source)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:815)
... 11 more
ERROR: Error fetching remote repo 'origin'


Refer to [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1074/console] for complete log
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Important
2 years, 24 weeks, 4 days ago 0|i3kr8n:
ZooKeeper ZOOKEEPER-2910

zookeeper listening on unknown random port

Bug Open Major Unresolved Patrick D. Hunt ken huang ken huang 30/Sep/17 05:11   21/Oct/17 22:11   3.4.8   server   0 1   CentOS 7
Zookeeper 3.4.8
JDK 1.8.0_111
when zookeeper start, it will listen on three port:
2181 for client connnect
3888 for leader election
random for what ?
three are three port config in zoo.cfg, 2181、2888、3888, but no 2888 listened on.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 21 weeks, 4 days ago 0|i3kqtb:
ZooKeeper ZOOKEEPER-2909

Create ant task to generate ivy dependency reports

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 29/Sep/17 14:14   09/Oct/17 14:35 09/Oct/17 13:58 3.4.10, 3.5.3, 3.6.0 3.4.11, 3.5.4, 3.6.0     0 4   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 23 weeks, 3 days ago 0|i3kq33:
ZooKeeper ZOOKEEPER-2908

quorum.auth.MiniKdcTest.testKerberosLogin failing with NPE on java 9

Bug Resolved Blocker Fixed Mark Fenes Patrick D. Hunt Patrick D. Hunt 29/Sep/17 13:00   06/Mar/18 08:54 05/Oct/17 11:20 3.4.11, 3.5.4 3.4.11, 3.5.4 security, tests   0 2   quorum.auth.MiniKdcTest.testKerberosLogin is failing with an NPE on Java 9.

I recently setup jenkins jobs for java 9 on branch 3.4 and 3.5 and the test is failing as follows.

{noformat}
javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input(s)
at java.base/java.util.Objects.requireNonNull(Objects.java:246)
at java.base/javax.security.auth.Subject$SecureSet.remove(Subject.java:1172)
at java.base/java.util.Collections$SynchronizedCollection.remove(Collections.java:2039)
at jdk.security.auth/com.sun.security.auth.module.Krb5LoginModule.logout(Krb5LoginModule.java:1193)
at java.base/javax.security.auth.login.LoginContext.invoke(LoginContext.java:732)
at java.base/javax.security.auth.login.LoginContext.access$000(LoginContext.java:194)
at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:665)
at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:663)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:663)
at java.base/javax.security.auth.login.LoginContext.logout(LoginContext.java:613)
at org.apache.zookeeper.server.quorum.auth.MiniKdcTest.testKerberosLogin(MiniKdcTest.java:179)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)

at java.base/javax.security.auth.login.LoginContext.invoke(LoginContext.java:821)
at java.base/javax.security.auth.login.LoginContext.access$000(LoginContext.java:194)
at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:665)
at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:663)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:663)
at java.base/javax.security.auth.login.LoginContext.logout(LoginContext.java:613)
at org.apache.zookeeper.server.quorum.auth.MiniKdcTest.testKerberosLogin(MiniKdcTest.java:179)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)
{noformat}

https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_java9/1/testReport/junit/org.apache.zookeeper.server.quorum.auth/MiniKdcTest/testKerberosLogin/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 23 weeks, 2 days ago 0|i3kq0f:
ZooKeeper ZOOKEEPER-2907

Logged request buffer isn't useful

Improvement Open Minor Unresolved Nishanth Entoor Jordan Zimmerman Jordan Zimmerman 28/Sep/17 16:40   22/Jan/20 16:29   3.4.10, 3.5.3   server   0 1 0 600   There are two places in the server code that log request errors with a message ala "Dumping request buffer..." followed by a hex dump of the request buffer. There are 2 major problems with this output:

# The request type is not output
# The byte-to-hex inline code doesn't pad numbers < 16

These two combine to make the output data nearly useless.

PrepRequestProcessor#pRequest() and FinalRequestProcessor#processRequest()
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 25 weeks ago 0|i3kolb:
ZooKeeper ZOOKEEPER-2906

The OWASP dependency check jar should not be included in the default classpath

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 28/Sep/17 03:29   02/Oct/17 14:58 29/Sep/17 18:56 3.4.11, 3.5.4, 3.6.0 3.4.11, 3.5.4, 3.6.0     0 4   The owasp dependency-check-ant jar that we use contains a SLF4J binding that can break logging. We should move it into a separate classpath. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 24 weeks, 3 days ago 0|i3knhb:
ZooKeeper ZOOKEEPER-2905

Don't include `config.h` in `zookeeper.h`

Bug Resolved Major Fixed Andrew Schwartzmeyer Andrew Schwartzmeyer Andrew Schwartzmeyer 25/Sep/17 18:41   27/Sep/17 19:34 27/Sep/17 18:19   3.4.11, 3.5.4, 3.6.0     0 4   Linux-ish environments. In ZOOKEEPER-2841 I fixed the inclusion of project-specific porting changes that were included in the public headers, which then broke upstream projects (in my case, Mesos).

Unfortunately, I inadvertently created the exact same problem for Linux (or really any system that uses Autotools), and it wasn't evident until the build was coupled with another project with the same problem. More specifically, when including ZooKeeper (with my changes) in Mesos, and including Google's Glog in Mesos, and building both with Autotools (which we also support), both packages define the pre-processor macro {{PACKAGE_VERSION}}, and so so publicly. This is defined in {{config.h}} by Autotools, and is not a problem _unless included publicly_.

When refactoring, I saw two includes in {{zookeeper.h}} that instead of being guarded by e.g. {{#ifdef HAVE_SYS_SOCKET_H}} were guarded by {{#ifndef WIN32}}. Without realizing that I would create the exact same problem I was elsewhere fixing, I erroneously added {{#include "config.h"}} and guarded the includes "properly." But there is _very good reasons_ not to do this (explained above).

The patch to fix this is simple:

{noformat}
diff --git a/src/c/include/zookeeper.h b/src/c/include/zookeeper.h
index d20e70af4..b0bb09e3f 100644
--- a/src/c/include/zookeeper.h
+++ b/src/c/include/zookeeper.h
@@ -21,13 +21,9 @@

#include <stdlib.h>

-#include "config.h"
-
-#ifdef HAVE_SYS_SOCKET_H
+/* we must not include config.h as a public header */
+#ifndef WIN32
#include <sys/socket.h>
-#endif
-
-#ifdef HAVE_SYS_TIME_H
#include <sys/time.h>
#endif

diff --git a/src/c/src/zookeeper.c b/src/c/src/zookeeper.c
index 220c57dc4..9b837f227 100644
--- a/src/c/src/zookeeper.c
+++ b/src/c/src/zookeeper.c
@@ -24,6 +24,7 @@
#define USE_IPV6
#endif

+#include "config.h"
#include <zookeeper.h>
#include <zookeeper.jute.h>
#include <proto.h>
{noformat}

I am opening pull requests in a few minutes to have this applied to branch 3.4 and 3.5.

I'm sorry!
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 25 weeks, 1 day ago 0|i3kix3:
ZooKeeper ZOOKEEPER-2904

Remove unused imports from org.apache.zookeeper.server.quorum.WatchLeakTest

Improvement Resolved Trivial Fixed Nikhil Bhide Nikhil Bhide Nikhil Bhide 24/Sep/17 13:38   04/Oct/17 14:35 04/Oct/17 13:27 3.5.3 3.5.4, 3.6.0     0 3   Remove unused imports from org.apache.zookeeper.server.quorum.WatchLeakTest
These imports are not used in the code and do not adhere to code convention and style.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 24 weeks, 1 day ago 0|i3kgu7:
ZooKeeper ZOOKEEPER-2903

ZOOKEEPER-2901 Port ZOOKEEPER-2901 to 3.5.4

Sub-task Resolved Blocker Fixed Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 22/Sep/17 18:28   07/Feb/19 05:42 10/May/18 01:00 3.5.3 3.5.4 server   0 2 0 1800   The TTL/Server ID bug is quite serious and should be back-ported to the 3.5.x branch 100% 100% 1800 0 pull-request-available, ttl_nodes 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks ago
Reviewed
0|i3kfrz:
ZooKeeper ZOOKEEPER-2902

Exhibitor

Test Resolved Major Invalid Unassigned ANH ANH 21/Sep/17 02:14   21/Sep/17 10:06 21/Sep/17 10:06         0 2   Ubuntu 16.04 Any one can help me in configuring exhibitor other than giving this link https://github.com/soabase/exhibitor ??
Extremely sorry to raise tickets related to Exhibitor.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 26 weeks ago 0|i3kc3z:
ZooKeeper ZOOKEEPER-2901

Session ID that is negative causes mis-calculation of Ephemeral Type

Bug Resolved Blocker Fixed Jordan Zimmerman Mark Johnson Mark Johnson 20/Sep/17 14:54   21/Jan/19 09:52 09/May/18 18:12 3.5.3 3.5.4, 3.6.0 server   0 9   ZOOKEEPER-2903 Running 3.5.3-beta in Docker container In the code that determines the EphemeralType it is looking at the owner (which is the client ID or connection ID):

EphemeralType.java:

public static EphemeralType get(long ephemeralOwner) {
if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) {
return CONTAINER;
}
if (ephemeralOwner < 0) {
return TTL;
}
return (ephemeralOwner == 0) ? VOID : NORMAL;
}

However my connection ID is:

header.getClientId(): -720548323429908480

This causes the code to think this is a TTL Ephemeral node instead of a
NORMAL Ephemeral node.

This also explains why this is random - if my client ID is non-negative
then the node gets added correctly.
100% 1800 0 ttl_nodes 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 45 weeks, 1 day ago 0|i3kbav:
ZooKeeper ZOOKEEPER-2900

Error: Could not find or load main class com.netflix.exhibitor.application.ExhibitorMain

Test Resolved Critical Invalid Unassigned ANH ANH 18/Sep/17 03:40   21/Sep/17 02:02 18/Sep/17 09:54         0 3   Ubuntu Server 16.04 LTS i am trying to set up an Exhibitor in an ubuntu server using Gradle . The reference link is mentioned below.
1) https://blog.imaginea.com/monitoring-zookeeper-with-exhibitor/
2) https://github.com/soabase/exhibitor/wiki/Running-Exhibitor

java -jar /home/ubuntu/gradle/build/libs/exhibitor-1.6.0.jar -C file-- this commands results in an error { Error: Could not find or load main class com.netflix.exhibitor.application.ExhibitorMain }
What will be the solution ?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 26 weeks, 3 days ago 0|i3k6cn:
ZooKeeper ZOOKEEPER-2899

Zookeeper not receiving packets after ZXID overflows

Bug Open Major Unresolved Unassigned Yicheng Fang Yicheng Fang 14/Sep/17 21:00   05/Oct/17 14:18   3.4.5   leaderElection   0 2   5 host ensemble, 1500+ client connections each, 300K+ nodes
OS: Ubuntu precise
JAVA 7
JuniperQFX510048T NIC, 10000Mb/s, ixgbe driver
6 core Intel(R)_Xeon(R)_CPU_E5-2620_v3_@_2.40GHz
4 HDD 600G each
ZK was used with Kafka (version 0.10.0) for coordination. We had a lot of Kafka consumers writing consumption offsets to ZK.

We observed the issue two times within the last year. Each time after ZXID overflowed, ZK was not receiving packets even though leader election looked successful from the logs, and ZK servers were up. As a result, the whole Kafka system came to a halt.

As an attempt to reproduce (and hopefully fixing) the issue, I set up test ZK and Kafka clusters and feed them with like-production test traffic. Though not really able to reproduce the issue, I did see that the Kafka consumers, which used ZK clients, essentially DOSed the ensemble, filling up the `submittedRequests` in `PrepRequestProcessor`, causing even 100ms+ read latencies.

More details are included in the comments.
9223372036854775807 No Perforce job exists for this issue. 6 9223372036854775807
2 years, 24 weeks ago 0|i3k3mf:
ZooKeeper ZOOKEEPER-2898

Zookeeper Management

Improvement Open Major Unresolved Unassigned ANH ANH 14/Sep/17 06:35   18/Sep/17 03:25   3.4.9       0 2   Is there any zookeeper management and monitoring tool? The results which i got after googling it only leads to monitoring tools for zookeeper eg --> zookeeper dashboard, zkfarmer, signalfx etc. I am already getting the the features of these tools by using sensu. I need a tool like kafkamanager. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 27 weeks ago 0|i3k253:
ZooKeeper ZOOKEEPER-2897

Jenkins precommit Zookeeper builds are failing for no reason

Bug Open Critical Unresolved Unassigned Nikhil Bhide Nikhil Bhide 14/Sep/17 01:36   14/Sep/17 01:38       build   0 1   Jenkins precommit Zookeeper builds are failing for no reason.
I opened PR for ZOOKEEPER-2896 and changes are pretty simple.
Changes are done in org.apache.zookeeper.test.CreateTest.java, and changes should not break anything, still build is failing.
Test results are showing issues with other tests.

https://github.com/apache/zookeeper/pull/374
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1029/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 27 weeks ago 0|i3k1sn:
ZooKeeper ZOOKEEPER-2896

Remove unused imports from org.apache.zookeeper.test.CreateTest.java

Improvement Resolved Minor Fixed Nikhil Bhide Nikhil Bhide Nikhil Bhide 13/Sep/17 10:53   27/Sep/17 18:26 27/Sep/17 17:30 3.5.3 3.5.4, 3.6.0 tests   0 3   Remove unused imports from org.apache.zookeeper.test.CreateTest.java 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 25 weeks, 1 day ago 0|i3k0tr:
ZooKeeper ZOOKEEPER-2895

a bug in zkServer.sh

Bug Open Major Unresolved Neha Bathra huangyun huangyun 11/Sep/17 23:51   13/Sep/17 05:08   3.4.10   other   0 3   centos 6.5
jdk1.8.0_131
I deploy ZooKeeper in a cluster of three nodes on three different linux machines.
All is going well,but I got the output of " Error contacting service. It is probably not running " when I executed "zkServer.sh status" to check if the node of cluster was running.
Finally, I found a problem in zkServer.sh
it got "clientPort=2181" when executing "zkServer.sh status" ,but the correct String is only "2181".So, It is a solution to the problem that removing "clientPort=" from "clientPort=2181" .

I hope you understand me. My poor English.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 27 weeks, 1 day ago 0|i3jy3b:
ZooKeeper ZOOKEEPER-2894

Memory and completions leak on zookeeper_close

Bug Closed Critical Fixed Alexander A. Strelets Alexander A. Strelets Alexander A. Strelets 08/Sep/17 07:26   16/Oct/19 14:59 08/Jul/19 20:22 3.4.10 3.6.0, 3.4.15, 3.5.6 c client   1 7 0 18600   Linux ubuntu 4.4.0-87-generic
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

https://github.com/apache/zookeeper.git
branch-3.4
ZooKeeper C Client *+single thread+* build

*The problem:*

First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in two ways:
a) from a ZooKeeper callback handler (completion or watcher) which in turn is called through _zookeeper_process()_
b) and from other places -- i.e., when the call-stack does not pass through any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_

The issue described here below is +specific only to the case (b)+. So, it's Ok with the case (a).

When _zookeeper_close()_ is called in the (b) way, the following happens:
1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, they all are removed from this queue and each of them is "completed" with personal fake response having status ZCLOSING. Such fake responses are put into _zh.completions_to_process_ queue. It's Ok
2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed+*
3. Different structures within _zh_ are dismissed and finally _zh_ is freed

This is illustrated on the screenshot attached to this ticket: you may see that the next instruction to execute will be _free(zh)_ while _zh.completions_to_process_ queue is not empty (see the "Variables" tab to the right).

Alternatively, the same situation but in the case (a) is handled properly -- i.e., all completion callback handlers are truly called with ZCLOSING and the memory is freed, both for subcases (a.1) when there is a failure like connection-timeout, connection-closed, etc., or (a.2) there is not failure. The reason is that any callback handler (completion or watcher) in the case (a) is called from the _process_completions()_ function which runs in the loop until _zh.completions_to_process_ queue gets empty. So, this function guarantees this queue to be completely processed even if new completions occur during reaction on previously queued completions.

*Consequently:*
1. At least there is definitely the +memory leak+ in the case (b) -- all the fake responses put into _zh.completions_to_process_ queue are lost after _free(zh)_
2. And it looks like a great misbehavior not to call completions on sent requests in the case (b) while they are called with ZCLOSING in the case (a) -- so, I think it's not "by design" but a +completions leak+

+To reproduce the case (b) do the following:+
- open ZooKeeper session, connect to a server, receive and process connected-watch, etc.
- then somewhere +from the main events loop+ call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK
- then, +immediately after it returned+, call _zookeeper_close()_
- note that completion callback handler for _zoo_acreate()_ *will not be called*

+To reproduce the case (a) do the following:+
- the same as above, open ZooKeeper session, connect to a server, receive and process connected-watch, etc.
- the same as above, somewhere from the main events loop call for example _zoo_acreate()_ with valid arguments -- it shall return ZOK
- but now don't call _zookeeper_close()_ immediately -- wait for completion callback on the commenced request
- when _zoo_acreate()_ completes, +from within its completion callback handler+, call another _zoo_acreate()_ and immediately after it returned call _zookeeper_close()_
- note that completion callback handler for the second _zoo_acreate()_ *will be called with ZCLOSING, unlike the case (b) described above*

*To fix this I propose:*
Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done in _handle_error(zhandle_t *zh,int rc)_.

This is a proposed fix: https://github.com/apache/zookeeper/pull/1000
// Previously proposed fix: https://github.com/apache/zookeeper/pull/363

[upd]
There are another tickets with about the same problem: ZOOKEEPER-1493, ZOOKEEPER-2073 (the "same" just according to their titles).
However, as I can see, the corresponding patches were applied on the branch 3.4.10, but the effect still persists -- so, this ticket does not duplicate the mentioned two.
100% 100% 18600 0 easyfix, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
Patch, Important
36 weeks, 2 days ago 0|i3jtcn:
ZooKeeper ZOOKEEPER-2893

very poor choice of logging if client fails to connect to server

Bug Resolved Major Fixed Andor Molnar Paul Millar Paul Millar 07/Sep/17 05:50   04/Oct/19 10:55 19/Dec/17 14:05 3.4.6, 3.5.3, 3.6.0 3.5.4, 3.6.0, 3.4.12 java client   1 7   We are using ZooKeeper in our project and have received reports that, when suffering a networking problem, log files become flooded with messages like:
{quote}
07 Sep 2017 08:22:00 (System) [] Session 0x45d3151be3600a9 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_131]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[na:1.8.0_131]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
{quote}

Looking at the code that logs this message ({{ClientCnxn}}), there seems to be quite a few problems here:
# the code logs a stack-trace, even though there is no bug here. In our project, we treat all logged stack-traces as bugs,
# if the networking issue is not fixed promptly, the log files is flooded with these message,
# The message is built using {{ClientCnxnSocket#getRemoteSocketAddress}}, yet in this case, this does not provide the expected information (yielding {{null}}),
# The log message fails to include a description of what actually went wrong.

(Additionally, the code uses string concatenation rather than templating when building the message; however, this is an optimisation issue)

My suggestion is that this log entry is updated so that it doesn't log a stack-trace, but does include some indication why the connection failed.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 13 weeks, 2 days ago
Reviewed
0|i3jr6f:
ZooKeeper ZOOKEEPER-2892

Improve lazy initialize and close stream for `PrepRequestProcessor`

Improvement Resolved Major Fixed Benedict Jin Benedict Jin Benedict Jin 05/Sep/17 21:58   08/Feb/19 10:32 07/Feb/19 05:23   3.6.0 server   0 3 0 4800   Improve lazy initialize and close stream for `PrepRequestProcessor`

* Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
* Close the `ByteArrayOutputStream` I/O stream
100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 6 weeks ago 0|i3joo7:
ZooKeeper ZOOKEEPER-2891

Invalid processing of zookeeper_close for mutli-request

Bug Closed Critical Fixed Alexander A. Strelets Alexander A. Strelets Alexander A. Strelets 05/Sep/17 08:41   16/Oct/19 14:59 05/Aug/19 20:00 3.4.10 3.6.0, 3.4.15, 3.5.6 c client   1 5 0 24000   Linux ubuntu 4.4.0-87-generic
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

https://github.com/apache/zookeeper.git
branch-3.4
ZooKeeper C Client *+single thread+* build

When I call _zookeeper_close()_ while there is a pending _multi_ request, I expect the request completes with _ZCLOSING_ (-116) status.

But with the existing code I actually get the following:
- the program exits with _SIGABRT_ from _assert(entry)_ in _deserialize_multi()_
- and even if I remove this assertion and just break the enclosing loop, the returned status is _ZOK_ but not _ZCLOSING_

So, there are two defects with processing calls to _zookeeper_close()_ for pending _multi_ requests: improper assertion in implementation and invalid status in confirmation.

*+I propose two changes in the code:+*

1. Screen _assert(entry)_ in _deserialize_multi()_ when it's called due to _zookeeper_close()_ (note that _entry_ may normally become equal to _NULL_ in this case), and as soon as _entry == NULL_, break the loop. To provide this, _deserialize_multi()_ must be informed by the caller weather it's currently the "normal" or the "special" case.

I propose adding a new parameter _rc_hint_ (return code hint) to _deserialize_multi()_. When _deserialize_multi()_ is called in "normal" case, _rc_hint_ is preset with _ZOK_ (0), and the behavior is absolutely the same as with the existing code. But when it's called due to _zookeeper_close()_, the _rc_hint_ is automatically preset with _ZCLOSING_ (-116) by the caller, and this changes the behavior of _deserialize_multi()_ as described above.

How it works:
Let _zookeeper_close()_ is called while there is a pending _multi_ request. Then function _deserialize_multi()_ is called for the so-called "Fake response" on _multi_ request which is fabricated by the function _free_completions()_. Such fake response includes only the header but zero bytes for the body. Due to this _deserialize_MultiHeader(ia, "multiheader", &mhdr)_, which is called repeatedly for each _completion_list_t *entry = dequeue_completion(clist)_, does not assign the _mhdr_ and keeps _mhdr.done == 0_ as it was originally initialized. Consequently the _while (!mhdr.done)_ loop does not ever end, and finally falls into the _assert(entry)_ with _entry == NULL_ when all fake sub-requests are "completed".
But if, as I propose, the caller made a hint to _deserialize_multi()_ that it's actually the "special" case (that it processes the fake response indeed, for example), with the proposed changes it would omit improper assertion and break the loop on the first _entry == NULL_. Now at least _deserialize_multi()_ exits and does not emit _SIGABRT_.

2. Passthrough the "return code hint" _rc_hint_, as it was initially specified by the caller, to the _deserialize_multi()_ return code, if the hint is not _ZOK_ (0).

How it works:
With the existing code _deserialize_multi()_ returns unsuccessful _rc_-code only if there is an error in processing some of subrequests. And if there are no errors, it returns _ZOK_ (0) which is assigned as the default value to _rc_ at the very beginning of the function. Indeed, in the case of fake multi-response there are no errors in subresponses (because they are empty and fake). So, _deserialize_multi()_ returns _ZOK_ (0). Then, with _rc = deserialize_multi(xid, cptr, ia)_ in _deserialize_response()_ it overrides the true _ZCLOSING_ status.
But if the true status (for example, _ZCLOSING_) is initially hinted to _deserialize_multi()_, as I propose, _deserialize_multi()_ would reproduce it back instead of irrelevant _ZOK_ (0). And consequently _deserialize_response()_ would finally report the true status (particularly _ZCLOSING_).

This is a proposed fix: https://github.com/apache/zookeeper/pull/999
// Previously proposed fix: https://github.com/apache/zookeeper/pull/360

[upd]
It looks like about the same problem is described in ZOOKEEPER-1636
However, the patch proposed in this ticket also remedies the second linked problem: reporting _ZCLOSING_ status (as required) to the multi-request completion handler.
100% 100% 24000 0 easyfix, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch, Important
32 weeks, 2 days ago 0|i3jnkn:
ZooKeeper ZOOKEEPER-2890

Local automatic variable is left uninitialized and then freed.

Bug Resolved Critical Fixed Alexander A. Strelets Alexander A. Strelets Alexander A. Strelets 05/Sep/17 06:53   07/Feb/19 10:03 10/Oct/17 14:51 3.4.10, 3.5.3, 3.6.0 3.4.11, 3.5.4, 3.6.0 c client   0 7 0 3600   Linux ubuntu 4.4.0-87-generic
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

https://github.com/apache/zookeeper.git
branch-3.4
ZooKeeper C Client *+single thread+* build

Function *_deserialize_response()_*, in _case COMPLETION_STRING_, uses local automatic variable *_struct CreateResponse res_* which is +left uninitialized+ and passed to the function _deserialize_GetACLResponse()_ and then to _deallocate_GetACLResponse()_.

The _deserialize_ function, which is called the first, is expected to assign the _res_ variable with a value from the parsed _struct iarchive *ia_. But, if _ia_ contains for example insufficient amount of bytes the _deserialize_String()_ function refuses of assigning a value to _res_, and _res_ stays uninitialized (the true case is described below). Then, the _deallocate_ function calls _deallocate_String()_ passing uninitialized _res_ with arguments. If incidentally the memory region in the program stack under the _res_ was not equal to NULL, the last call +leads to _free()_ by invalid address+.

The true case: this happens when an active _multi_ request with _create_ sub-request is completed on call to _zookeeper_close()_ with the so called "Fake response" which is fabricated by the function _free_completions()_. Such response includes only the header but +zero bytes for the body+. The significant condition is that the _create_ request is not a stand-alone one, but namely a sub-request within the _multi_ request. In this case the _deserialize_response()_ is called recursively (for each sub-request), and when it is called for the _create_ subrequest (from the nested _deserialize_multi()_) the _failed_ parameter is assigned with false (0), so the _if (failed)_ condition branches to the _else_ part. Note that in the stand-alone create-request case this does not occur.

*I suspect this may happen not only due to call to _zookeeper_close()_ but on reception of a true multi-response from the server* containing insufficient number of bytes (I'm not sure if it can be a proper response from the server with an error overall status and empty or insufficient payload).

This is a proposed fix: https://github.com/apache/zookeeper/pull/359
100% 100% 3600 0 easyfix, pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch, Important
2 years, 23 weeks, 2 days ago 0|i3jnen:
ZooKeeper ZOOKEEPER-2889

Zookeeper standalone instance startup references logging classes incompatible with log4j-1.2-api

Bug Open Major Unresolved Unassigned Karl Wright Karl Wright 02/Sep/17 04:58   06/Sep/17 08:18   3.4.8       1 2   Starting Zookeeper in the following way causes "ClassNotFoundException" errors, and aborts, in a log4j 2.x environment:

{code}
"%JAVA_HOME%\bin\java" %JAVAOPTIONS% org.apache.zookeeper.server.quorum.QuorumPeerMain zookeeper.cfg
{code}

The log4j 2.x jars in the classpath are:

{code}
log4j-1.2-api
log4j-core
log4j-api
{code}

It appears that the Zookeeper QuorumPeerMain class is incompatible with the limited log4j 1.2 API that log4j 2.x includes. Zookeeper 3.4.8 works fine with log4j 2.x except when you start it as a service in this way.

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 28 weeks, 1 day ago 0|i3jkvj:
ZooKeeper ZOOKEEPER-2888

Reconfig Command Isolates One of the Nodes when All Ports Change

Bug Open Major Unresolved Unassigned Cesar Stuardo Cesar Stuardo 01/Sep/17 15:52   01/Sep/17 15:59   3.5.3   quorum   0 2   When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3 by following the workload (complete details attached):

1. start a 5 node cluster (all nodes know each other).
2. wait for the cluster to reach a steady state.
3. issue reconfig command which does not add or remove nodes but changes all the ports of the existing cluster (no role change either).

We observer that in some situations, one of the followers my end up isolated since the other nodes change their ports and end up setting up new connections. The consequence is similar to the one at [ZK-2865|https://issues.apache.org/jira/browse/ZOOKEEPER-2865?jql=] but the scenario is different.

We provide further details in the attached document.

9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 28 weeks, 6 days ago 0|i3jkb3:
ZooKeeper ZOOKEEPER-2887

define dependency versions in build.xml to be easily overridden in build.properties

Improvement Resolved Major Fixed Tamas Penzes Tamas Penzes Tamas Penzes 01/Sep/17 12:34   04/Oct/17 13:32 27/Sep/17 23:10   3.4.11, 3.5.4, 3.6.0 build   0 3   Dependency versions are defined in ivy.xml, which is suboptimal since it is hard to override them from a script.

If we defined the versions in the main build.xml (just as we do with audience-annotations.version) and use variables in ivy.xml then we could easily override the versions with creating a build.properties file, which mechanism is already built in.
This way the dependency versions could be replaced by sed or any simple command line tool.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 24 weeks, 1 day ago 0|i3jk3z:
ZooKeeper ZOOKEEPER-2886

Permanent session moved error in multi-op only connections

Bug Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 30/Aug/17 14:39   10/Jul/18 15:27 10/Jul/18 07:03 3.4.10, 3.5.3, 3.6.0 3.6.0 server   0 6 0 4800   If there are slow followers, it's possible that the leader and the client disagree on where the client is connecting to, therefore the client keeps getting "Session Moved" error. Partial of the issue fixed in Jira: ZOOKEEPER-710, but leaves the issue in multi-op only connection. 100% 100% 4800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 36 weeks, 2 days ago 0|i3jgpr:
ZooKeeper ZOOKEEPER-2885

zookeeper-3.5.3-beta.tar.gz file in mirror site is corrupted

Bug Resolved Critical Fixed Michael Han Gabriel Gabriel 29/Aug/17 20:02   22/Jun/18 18:37 30/Aug/17 06:44 3.5.3 3.5.3     2 6   I downloaded the zookeeper-3.5.3-beta.tar.gz file from several mirror sites and all of them are corrupted.

{quote}$ wget http://www-us.apache.org/dist/zookeeper/zookeeper-3.5.3-beta/zookeeper-3.5.3-beta.tar.gz
$:~/dockerfiles$ tar -xzvf zookeeper-3.5.3-beta.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now{quote}

If this is my mistake, please could you explain me what I did wrong?

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 28 weeks, 1 day ago Thank you all. It's a POSIX tar file, not gzip. My mistake. 0|i3jfgf:
ZooKeeper ZOOKEEPER-2884

Failed to connect to zookeeper instance within 2x zookeeper timeout period 30000

Task Open Major Unresolved Unassigned sharvari sharvari 28/Aug/17 11:02   28/Aug/17 11:02           0 1   I am new to GEOMESA, our GEOMESA running on Linux, stack consist of Hadoop , accumulo , zookeeper, I am trying to run create-schema command for GEOMESA and I am getting the below error

~# geomesa create-schema -uxxx -p xxx -i instance -z xxZoo01 -c test_create -f testing -s fid:String:index=true,dtg:Date,geom:Point:srid=4326 --dtg dtg
ERROR Failed to connect to zookeeper (srfZoo01) within 2x zookeeper timeout period 30000

I would really appreciate your help.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 29 weeks, 3 days ago 0|i3jcsn:
ZooKeeper ZOOKEEPER-2883

no null check for the pointer which returned by allocate_buffer() function

Bug Patch Available Minor Unresolved guoxiang niu guoxiang niu guoxiang niu 23/Aug/17 11:03   24/Aug/17 00:46       c client   0 2   1、in check_events() function, no null check for the pointer returned by allocate_buffer, the pointer will be passed to recv_buffer(), then the curr_offset member of pointer will be accessed directly.

2、in queue_session_event(), curr_offset also be accessed directly without null check.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 30 weeks, 1 day ago 0|i3j71j:
ZooKeeper ZOOKEEPER-2882

memory leak in zoo_amulti() function

Bug Patch Available Minor Unresolved guoxiang niu guoxiang niu guoxiang niu 23/Aug/17 10:55   11/Sep/17 18:39       c client   0 2   when default branch is executed in switch(op->type) , alloced memory for oa variable will leak, so, close_buffer_oarchive(&oa, 1); should be called before returning in default branch. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 27 weeks, 3 days ago 0|i3j70v:
ZooKeeper ZOOKEEPER-2881

memory leak in create_buffer_iarchive() and create_buffer_oarchive()

Bug Patch Available Minor Unresolved Unassigned guoxiang niu guoxiang niu 23/Aug/17 10:46   23/Aug/17 11:17       c client   0 1   in create_buffer_iarchive() function, null check of ia and return should be done before allocing memory for buff, otherwise, memory of buff might be leak.

same issue is existing in create_buffer_oarchive() function.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 30 weeks, 1 day ago 0|i3j70f:
ZooKeeper ZOOKEEPER-2880

Rename README.txt to README.md

Improvement Resolved Minor Fixed Manoj Mallela Michael Han Michael Han 21/Aug/17 22:18   01/Nov/17 13:55 23/Aug/17 20:12   3.4.11, 3.5.4, 3.6.0 other   0 4   This task is to rename the README.txt to README.md so github can render the mark downs. The added benefit is https://github.com/apache/zookeeper will look cooler as we are adding more mark downs to the README file... newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 20 weeks, 1 day ago 0|i3j4en:
ZooKeeper ZOOKEEPER-2879

Adding observers dynamically without server id

Improvement Open Major Unresolved Fangmin Lv Fangmin Lv Fangmin Lv 21/Aug/17 13:35   02/Feb/19 21:11   3.6.0   quorum   0 5 0 1800   Dynamic config requires observer has unique server id, which means we cannot simply add observer with dynamic server id -1. For large observer cluster, it's much more easier to add observer without unique server id if it doesn't need to be promoted to participant. Also, it will make dynamic config more efficient, we don't need to store and send the long list of observers during re-config. 100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 29 weeks, 1 day ago 0|i3j3pj:
ZooKeeper ZOOKEEPER-2878

some issues in c code of lock of recipes

Bug Open Minor Unresolved Unassigned H Y H Y 21/Aug/17 02:33   21/Aug/17 02:33       recipes   0 3   There are three issues in the c code of lock.
1, It not multi-thread safe, because pmutex is the local mutex of zkr_lock_mutex_t, if there are more than one thread is calling zkr_lock_lock, it may watch children node fail(retry_zoowexists may return NOT ZOK, 'unable to watch my predecessor' will output. I suggest that changing the pmutex to global.
2,child_floor function is not correct, it should compare the sequence of the node.(zoo_lock.c, line145 should be 'if (strcmp((sorted_data[i] + 19), (element + 19)) < 0)'
3, Logic mistaking in zkr_lock_operation of zoo_lock.c at line 256. mutex->id should be allocated by getName function. So, I think that we should delete from line 249 to 257.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 30 weeks, 3 days ago 0|i3j2sv:
ZooKeeper ZOOKEEPER-2877

ZOOKEEPER-3170 Flaky Test: org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun

Sub-task Open Major Unresolved Unassigned Michael Han Michael Han 18/Aug/17 00:01   21/Nov/18 21:26       tests   0 4   {noformat}
Error Message

expected:<1> but was:<0>
Stacktrace

junit.framework.AssertionFailedError: expected:<1> but was:<0>
at org.apache.zookeeper.server.quorum.Zab1_0Test$6.converseWithLeader(Zab1_0Test.java:939)
at org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderConversation(Zab1_0Test.java:398)
at org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun(Zab1_0Test.java:906)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
{noformat}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks ago 0|i3ixuf:
ZooKeeper ZOOKEEPER-2876

Github pull request test script should output -1 when there is no tests provided in patch, unless the subject under test is a documentation JIRA

Bug Open Major Unresolved Unassigned Michael Han Michael Han 16/Aug/17 17:52   16/Aug/17 17:52       build-infrastructure, tests   0 2   The github pull request test script (which is invoked as part of pre-commit workflow) should output -1 on a patch which does not include any tests, unless the patch is a documentation only patch.

We had this expected behavior before when we use the old PATCH approach:
{noformat}
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
{noformat}

A quick look on the [script|https://github.com/apache/zookeeper/blob/master/src/java/test/bin/test-github-pr.sh#L224] indicates that we do not set up the $PATCH/jira directory in the github test pull script, so it always thinks incoming pull request is a documentation only patch. This should be fixed so we get the old behavior and enforce that any new pull request must have tests unless explicitly justified not have to.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 31 weeks, 1 day ago 0|i3ivhr:
ZooKeeper ZOOKEEPER-2875

Add ant task for running OWASP dependency report

New Feature Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 15/Aug/17 03:04   16/Sep/17 15:50 11/Sep/17 00:35 3.4.10, 3.5.3, 3.6.0 3.4.11, 3.5.4, 3.6.0     0 3   The OWASP dependency check is a tool "that identifies project dependencies and checks if there are any known, publicly disclosed, vulnerabilities". We could run this tool periodically to make sure that we are not shipping any security vulnerabilities through our dependencies. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 27 weeks, 3 days ago 0|i3isnb:
ZooKeeper ZOOKEEPER-2874

Windows Debug builds don't link with `/MTd`

Bug Resolved Major Fixed Andrew Schwartzmeyer Andrew Schwartzmeyer Andrew Schwartzmeyer 11/Aug/17 18:02   18/Aug/17 01:35 18/Aug/17 00:21   3.4.11, 3.5.4, 3.6.0     0 5   Windows 10 using CMake While not apparent when building ZooKeeper stand-alone, further testing when linking with Mesos revealed it was ZooKeeper that was causing the warning:

{noformat}
LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' conflicts with use of other libs; use /NODEFAULTLIB:library [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj]
{noformat}

As Mesos is linking with {{/MTd}} in Debug configuration (which is the most common practice).

Once I found the source of the warning, the fix is trivial and I am posting a patch.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 30 weeks, 6 days ago 0|i3ippj:
ZooKeeper ZOOKEEPER-2873

print error and/or abort on invalid server definition

Improvement Closed Minor Fixed Norbert Kalmár Christopher Smith Christopher Smith 10/Aug/17 17:37   17/Jul/18 00:50 04/Jul/18 11:12 3.4.10 3.4.13, 3.5.5 server   1 5 0 9000   While bringing up a new cluster, I managed to fat-finger a sed script and put some lines like this into my config file:

{code}
server.1=zookeeper1:2888:2888
{code}

This led to a predictable spew of error messages when the client and election components fought over the single port. Since a configuration of this case is *always* an error, I suggest that it would be sensible to abort the server startup if an entry is found with the same port for both client and election. (Logging the error explicitly without shutting down is less helpful because of how fast the logs pile up.)
100% 100% 9000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 37 weeks, 1 day ago 0|i3inyn:
ZooKeeper ZOOKEEPER-2872

Interrupted snapshot sync causes data loss

Bug Open Major Unresolved Unassigned Brian Nixon Brian Nixon 10/Aug/17 14:07   15/Nov/19 09:45   3.4.10, 3.5.3, 3.6.0   server   0 6 0 600   There is a way for observers to permanently lose data from their local data tree while remaining members of good standing with the ensemble and continuing to serve client traffic when the following chain of events occurs.

1. The observer dies in epoch N from machine failure.
2. The observer comes back up in epoch N+1 and requests a snapshot sync to catch up.
3. The machine powers off before the snapshot is synced to disc and after some txn's have been logged (depending on the OS, this can happen!).
4. The observer comes back a second time and replays its most recent snapshot (epoch <= N) as well as the txn logs (epoch N+1).
5. A diff sync is requested from the leader and the observer broadcasts availability.

In this scenario, any commits from epoch N that the observer did not receive before it died the first time will never be exposed to the observer and no part of the ensemble will complain.

This situation is not unique to observers and can happen to any learner. As a simple fix, fsync-ing the snapshots received from the leader will avoid the case of missing snapshots causing data loss.
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 12 weeks, 1 day ago 0|i3inmv:
ZooKeeper ZOOKEEPER-2871

ZOOKEEPER-1416 Port ZOOKEEPER-1416 to 3.5.x

Sub-task Open Major Unresolved Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 09/Aug/17 22:51   05/Feb/20 07:16   3.5.3 3.5.8 c client, documentation, java client, server   1 2   Port the work of Persistent Recursive Watchers (ZOOKEEPER-1416) to 3.5.x 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks ago 0|i3imh3:
ZooKeeper ZOOKEEPER-2870

Improve the efficiency of AtomicFileOutputStream

Improvement Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 09/Aug/17 15:59   10/Aug/17 16:35 10/Aug/17 16:15 3.4.10, 3.5.3, 3.6.0 3.4.11, 3.5.4, 3.6.0 server   0 4   The AtomicFileOutputStream extends from FilterOutputStream, where the write function writes data to underlying stream byte by byte: https://searchcode.com/codesearch/view/17990706/, which is very inefficient.

Currently, we only this this class to write the dynamic config, because it's quite small it won't be a big problem. But in the future we may want to use this class to write the snapshot file, which will take much longer time, tested inside, writing 600MB snapshot will take more than 10 minutes, while using FileOutputStream directly only takes 6s.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks ago 0|i3im1b:
ZooKeeper ZOOKEEPER-2869

Allow for exponential backoff in ClientCnxn.SendThread on connection re-establishment

Improvement Open Minor Unresolved Unassigned Nick Travers Nick Travers 07/Aug/17 17:02   07/Aug/17 19:55   3.4.10, 3.5.3   java client   0 4   As part of ZOOKEEPER-961, when the client re-establishes a connection to the server, it will sleep for a random number of milliseconds in the range [0, 1000). Introduced [here|https://github.com/apache/zookeeper/commit/d84dc077d576b7cdfbfd003e3425fab85ca29a44].

These reconnects can cause excessive logging in clients if the server is unavailable for an extended period of time, with reconnects every 500ms on average.

One solution could be to allow for exponential backoff in the client. The backoff params could be made configurable.

[3.5.x code|https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1059].
[3.4.x code|https://github.com/apache/zookeeper/blob/release-3.4.9/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1051].
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks, 3 days ago 0|i3iijb:
ZooKeeper ZOOKEEPER-2868

Clover report does not show test results

Improvement Resolved Minor Duplicate Unassigned Duo Xu Duo Xu 07/Aug/17 14:56   21/May/18 16:11 21/May/18 16:11 3.4.10       0 2   In the build.xml, the <testsources dir="xxx"/> is not specified in <clover-setup/>, causing clover cannot find the tests source code. As the result clover reports are incomplete and do not provide viral per-test coverage info. 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 43 weeks, 3 days ago 0|i3iibz:
ZooKeeper ZOOKEEPER-2867

an expired ZK session can be re-established

Bug Open Major Unresolved Michael Han Jun Rao Jun Rao 04/Aug/17 19:58   30/May/18 20:13   3.4.10       0 4   Not sure if this is a real bug, but I found an instance when a ZK client seems to be able to renew a session already expired by the ZK server.

From ZK server log, session 25cd1e82c110001 was expired at 22:04:39.

{code:java}
June 27th 2017, 22:04:39.000 INFO org.apache.zookeeper.server.ZooKeeperServer Expiring session 0x25cd1e82c110001, timeout of 12000ms exceeded
June 27th 2017, 22:04:39.001 DEBUG org.apache.zookeeper.server.quorum.Leader Proposing:: sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4 txntype:-11 reqpath:n/a
June 27th 2017, 22:04:39.001 INFO org.apache.zookeeper.server.PrepRequestProcessor Processed session termination for sessionid: 0x25cd1e82c110001
June 27th 2017, 22:04:39.001 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor Processing request:: sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4 txntype:-11 reqpath:n/a
June 27th 2017, 22:05:20.324 INFO org.apache.zookeeper.server.quorum.Learner Revalidating client: 0x25cd1e82c110001
June 27th 2017, 22:05:20.324 INFO org.apache.zookeeper.server.ZooKeeperServer Client attempting to renew session 0x25cd1e82c110001 at /100.96.5.6:47618
June 27th 2017, 22:05:20.325 INFO org.apache.zookeeper.server.ZooKeeperServer Established session 0x25cd1e82c110001 with negotiated timeout 12000 for client /100.96.5.6:47618
{code}

From ZK client's log, it was able to renew the expired session on 22:05:20.

{code:java}
June 27th 2017, 22:05:18.590 INFO org.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 4485ms for sessionid 0x25cd1e82c110001, closing socket connection and attempting reconnect 0
June 27th 2017, 22:05:18.590 WARN org.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 4485ms for sessionid 0x25cd1e82c110001 0
June 27th 2017, 22:05:19.325 WARN org.apache.zookeeper.ClientCnxn SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/opt/confluent/etc/kafka/server_jaas.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. 0
June 27th 2017, 22:05:19.326 INFO org.apache.zookeeper.ClientCnxn Opening socket connection to server 100.65.188.168/100.65.188.168:2181 0
June 27th 2017, 22:05:20.324 INFO org.apache.zookeeper.ClientCnxn Socket connection established to 100.65.188.168/100.65.188.168:2181, initiating session 0
June 27th 2017, 22:05:20.327 INFO org.apache.zookeeper.ClientCnxn Session establishment complete on server 100.65.188.168/100.65.188.168:2181, sessionid = 0x25cd1e82c110001, negotiated timeout = 12000 0

{code}
9223372036854775807 No Perforce job exists for this issue. 11 9223372036854775807
1 year, 42 weeks, 1 day ago 0|i3ig7z:
ZooKeeper ZOOKEEPER-2866

Reconfig Causes Newly Joined Node to Crash

Bug Resolved Major Not A Problem Alexander Shraer Jeffrey F. Lukman Jeffrey F. Lukman 04/Aug/17 17:31   09/Aug/17 23:17 09/Aug/17 23:16 3.5.3   leaderElection, quorum, server   0 3   When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3
by following the workload in ZK-2778:
* initially start 2 ZooKeeper nodes
* start 3 new nodes and let them join the cluster
* do a reconfiguration where the newly joined will be PARTICIPANTS,
while the previous 2 nodes change to be OBSERVERS

We think our DMCK found this following bug:
* one of the newly joined node crashes due to
it receives an *unexpected* PROPOSAL message
from the new leader in the cluster.

For complete information of the bug, please see the document that is attached.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 32 weeks ago 0|i3ig33:
ZooKeeper ZOOKEEPER-2865

Reconfig Causes Inconsistent Configuration file among the nodes

Improvement Resolved Trivial Fixed Alexander Shraer Jeffrey F. Lukman Jeffrey F. Lukman 04/Aug/17 16:53   01/Sep/17 15:30 06/Aug/17 00:22 3.5.3 3.5.4, 3.6.0 documentation   0 5   When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3
by following the workload in ZK-2778:
- initially start 2 ZooKeeper nodes
- start 3 new nodes
- do a reconfiguration (the complete reconfiguration is attached in the document)

We think our DMCK found this following bug:
- while one of the just joined nodes has not received the latest configuration update
(called as node X), the initial leader node closed its port,
therefore causing the node X to be isolated.

For complete information of the bug, please see the document that is attached.

9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 28 weeks, 6 days ago 0|i3ig0n:
ZooKeeper ZOOKEEPER-2864

Add script to run a java api compatibility tool

Improvement Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 04/Aug/17 14:38   10/Aug/17 16:35 10/Aug/17 16:06 3.4.10, 3.5.3 3.4.11, 3.5.4, 3.6.0     0 3   We should use the annotations added in ZOOKEEPER-2829 to run a script to verify api compatibility. See KUDU-1265 for an example. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks ago 0|i3ifrz:
ZooKeeper ZOOKEEPER-2863

downconfig downloads empty file as folder

Bug Resolved Minor Fixed Unassigned Isabelle Giguere Isabelle Giguere 04/Aug/17 12:31   04/Aug/17 13:53 04/Aug/17 13:53 3.4.10       0 1   Windows 7 When downloading a config (ex: a Solr config) from Zookeeper 3.4.10, if a file is empty, it is downloaded as a folder (on Windows, at least).

A Zookeeper browser (Eclipse: Zookeeper Explorer) shows the file as a file, however, in ZK.

Noticed because we keep an empty synonyms.txt file in the Solr config provided with our product, in case a client would want to use it.

The workaround is simple, if the file allows comments: just add a comment, so it is not empty.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks, 6 days ago 0|i3ifin:
ZooKeeper ZOOKEEPER-2862

Incorrect javadoc syntax for web links in StaticHostProvider.java

Bug Resolved Major Fixed Michael Han Michael Han Michael Han 03/Aug/17 14:01   03/Aug/17 14:35 03/Aug/17 14:35 3.5.3, 3.6.0 3.5.4, 3.6.0 documentation, java client   0 3   {{StaticHostProvider#updateServerList}} uses wrong syntax to embed https link in the java doc. Previously this issue was not visible because StaticHostProvider was not public of public java doc; after ZOOKEEPER-2829 {{StaticHostProvider}} is now part of API doc and incorrect syntax leads to java doc warnings, which creates noises in Jenkins pre-commit jobs. javadoc 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 33 weeks ago 0|i3idxj:
ZooKeeper ZOOKEEPER-2861

Main-Class JAR manifest attribute is incorrect

Bug Resolved Minor Fixed Yaniv Kunda Yaniv Kunda Yaniv Kunda 01/Aug/17 08:23   18/Aug/17 19:41 18/Aug/17 18:30   3.4.11, 3.5.4, 3.6.0 build   0 5   ZOOKEEPER-82 (fixed since 3.0.0) had QuorumPeerMain extracted from QuorumPeer, but the Main-Class attribute in the JAR manifest was not changed accordingly.
Fixing this will enable to start the server using "java -jar" instead of specifying the class name.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
2 years, 30 weeks, 6 days ago 0|i3i9ef:
ZooKeeper ZOOKEEPER-2860

Update sample server jaas config for kerberos auth

Bug Open Major Unresolved Unassigned Andrey Andrey 27/Jul/17 05:14   21/Aug/17 07:19       documentation   0 3   Currently sample server jaas configuration for kerberos contains:
{code}
principal="zookeeper/yourzkhostname"
{code}

Background on why "princinpal=SPN" and "isInitiator=true" won't work is here:
https://dmdaa.wordpress.com/2010/03/27/the-impact-of-isinitiator-on-jaas-login-configuration-and-the-role-if-spn/

Expected:
{code}
isInitiator=false
principal="zookeeper/yourzkhostname";
{code}

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 30 weeks, 3 days ago 0|i3i2xj:
ZooKeeper ZOOKEEPER-2859

CMake build doesn't support OS X

Bug Resolved Major Fixed Andrew Schwartzmeyer Andrew Schwartzmeyer Andrew Schwartzmeyer 26/Jul/17 17:25   13/Aug/17 02:35 13/Aug/17 01:22   3.4.11, 3.5.4, 3.6.0     0 5   OS X 10.12.6 Couple problems:

libm, librt, and libdl are all Linux specific, and provided "for free" on OS X

CppUnit (at least on OS X) needs `-std=c++11`

clang's ld doesn't understand --wrap

I can post an easy patch that at least lets you build the client (but not the tests). The tests use that `--wrap` and it's non trivial to fix that on OS X.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 31 weeks, 4 days ago 0|i3i253:
ZooKeeper ZOOKEEPER-2858

Disable reverse DNS lookup for java client

New Feature Open Major Unresolved Unassigned Andrey Andrey 26/Jul/17 10:03   17/Nov/17 17:12   3.4.6   java client   1 6   I have the following setup:
- zookeeper server running in docker container
- kerberos auth

When client setup sasl connection it creates service principal name as:
- "principalUserName+"/"+addr.getHostName()",

where:
- addr.getHostName is the reverse DNS of original server host.

If zookeeper nodes will be deployed behind the firewall or software defined network (the docker case), then reverse DNS host won't match original server host. And this is done by design.

If these hosts won't match, then principals won't match and Kerberos auth will fail.

Is it possible to introduce some configuration parameter to disable reverse DNS lookups?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 6 days ago 0|i3i1cf:
ZooKeeper ZOOKEEPER-2857

Server deal command has problem

Bug Open Major Unresolved Unassigned Bo Hu Bo Hu 26/Jul/17 03:57   31/Jul/17 01:20   3.3.0   server   0 3   NIOServerCnxn.java
private boolean readLength(SelectionKey k) throws IOException
if (!initialized && checkFourLetterWord(sk, len)) {
return false;
}

I think this is a problem. when initialized is true, it also need execute checkFourLetterWord, but it don't execute.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 33 weeks, 3 days ago 0|i3i0w7:
ZooKeeper ZOOKEEPER-2856

ZooKeeperSaslClient#respondToServer should log exception message of SaslException

Improvement Resolved Minor Fixed Pan Yuxuan Pan Yuxuan Pan Yuxuan 25/Jul/17 05:34   26/Jul/17 21:35 26/Jul/17 20:25 3.4.10, 3.5.3 3.4.11, 3.5.4, 3.6.0     0 5   When upstream like HBase call ZooKeeperSaslClient with security enabled, we sometimes get error in HBase logs like:
{noformat}
SASL authentication failed using login context 'Client'.
{noformat}
This error occures when getting SaslException in ZooKeeperSaslClient#respondToServer :
{noformat}
catch (SaslException e) {
LOG.error("SASL authentication failed using login context '" +
this.getLoginContext() + "'.");
saslState = SaslState.FAILED;
gotLastPacket = true;
}
{noformat}
This error makes user confused without explicit exception message. So I think we can add exception message to the log.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 34 weeks ago 0|i3hywf:
ZooKeeper ZOOKEEPER-2855

Rebooting a Joined Node Failed Due The Joined Node Previously Failed to Update Its Configuration Correctly

Bug Open Major Unresolved Alexander Shraer Jeffrey F. Lukman Jeffrey F. Lukman 24/Jul/17 16:36   18/May/18 15:29   3.5.3   leaderElection, quorum, server   0 5   We are testing our distributed system model checking (DMCK)
by directing our DMCK to reproduce the ZooKepeer-2172 bug in the ZooKeeper v3.5.3.

After some exploration, our DMCK found that the ZOOKEEPER-2172 still linger in the reported fixed version, ZooKeeper v.3.5.3.

Here we attached the complete bug scenario to reproduce the bug.
We have communicated this bug to [~shralex] and he has confirmed that this bug exists.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 20 weeks, 3 days ago 0|i3hy5z:
ZooKeeper ZOOKEEPER-2854

fatjar invalid, contains incomplete signature

Bug Open Minor Unresolved Unassigned fragfutter fragfutter 24/Jul/17 09:08   19/May/18 21:08   3.4.10   contrib-fatjar   2 4   The fatjar in 3.4.10 contains signature parts (META-INF/BCKEY.DSA and META-INF/BCKEY.SF). As a result it is not runnable

Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes

deleting these from the jar solves the issue.
{{zip -d contrib/fatjar/*-fatjar.jar 'META-INF/*.SF' 'META-INF/*.DSA'}}

As far as i know a jar is signed all or nothing.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 43 weeks, 4 days ago 0|i3hxin:
ZooKeeper ZOOKEEPER-2853

The lastZxidSeen in FileTxnLog.java is never being assigned

Bug Resolved Minor Fixed Fangmin Lv Fangmin Lv Fangmin Lv 23/Jul/17 21:33   03/Aug/17 17:04 03/Aug/17 14:41   3.5.4, 3.6.0 server   0 5   There is log in FileTxnLog#append to track the txn with smaller zxid than the last seen, but the lastZxidSeen is never being assigned, so that log is not printed when it's happened. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 33 weeks ago 0|i3hwlz:
ZooKeeper ZOOKEEPER-2852

Snapshot size factor is not read from system property

Bug Resolved Major Fixed Fangmin Lv Fangmin Lv Fangmin Lv 23/Jul/17 21:10   03/Aug/17 14:31 03/Aug/17 14:28 3.5.3, 3.6.0 3.5.4, 3.6.0 server   0 4   There is data inconsistency issue found when leader is using on disk txn files to sync with learner: ZOOKEEPER-2846, tried to disable this feature by setting zookeeper.snapshotSizeFactor system property, but found this system property is not being read anywhere. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 33 weeks ago 0|i3hwlj:
ZooKeeper ZOOKEEPER-2851

ZOOKEEPER-2639 [QP MutualAuth]: add QuorumCnxManager tests that covers quorum auth logic.

Sub-task Open Major Unresolved Michael Han Michael Han Michael Han 23/Jul/17 00:40   05/Feb/20 07:16   3.5.3 3.7.0, 3.5.8 quorum, server, tests   0 2   Some of the ZOOKEEPER-1045 unit tests were implemented as part of {{QuorumCnxManagerTest}} however, this class is only available in branch-3.4: it was introduced in ZOOKEEPER-1633 to cover upgrade path testing from 3.4 to 3.5, which is a feature not available in branch-3.5.

This task is to migrate ZOOKEEPER-1045 related tests in {{QuorumCnxManagerTest}} from branch-3.4 to branch-3.5.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks, 1 day ago 0|i3hw1z:
ZooKeeper ZOOKEEPER-2850

ZOOKEEPER-2639 [QP MutualAuth]: Port ZOOKEEPER-2650 and ZOOKEEPER-2759 from branch-3.4 to branch-3.5.

Sub-task Open Major Unresolved Michael Han Michael Han Michael Han 21/Jul/17 12:41   05/Feb/20 07:16   3.5.3 3.7.0, 3.5.8 quorum, security   1 3   These patches are improvements to test cases and small bug fixes after ZOOKEEPER-1045 was committed to branch-3.4. We need port them to branch-3.5 to close the loop. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks, 1 day ago 0|i3huvr:
ZooKeeper ZOOKEEPER-2849

Quorum port binding needs exponential back-off retry

Improvement Open Minor Unresolved Brian Lininger Brian Lininger Brian Lininger 20/Jul/17 12:20   21/Nov/17 13:15   3.4.6, 3.5.3   quorum   0 3   Recently we upgraded the AWS instance type we use for running out ZooKeeper nodes, and by doing so we're intermittently hitting an issue where ZooKeeper cannot bind to the server election port because the IP is incorrect. This is due to name resolution in Route53 not being in sync when ZooKeeper starts on the more powerful EC2 instances. Currently in QuorumCnxManager.Listener, we only attempt to bind 3 times with a 1s sleep between retries, which is not long enough.

I'm proposing to change this to follow an exponential back-off type strategy where each failed attempt causes a longer sleep between retry attempts. This would allow for Zookeeper to gracefully recover when the host is misconfigured, and subsequently corrected, without requiring the process to be restarted while also minimizing the impact to the running instance.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 2 days ago 0|i3ht67:
ZooKeeper ZOOKEEPER-2848

Getting this error in ZK server logs : shutdown Leader! reason: Not sufficient followers synced, only synced with sids

Bug Resolved Minor Not A Bug Michael Han prashantkumar prashantkumar 19/Jul/17 09:48   01/Jun/18 18:03 01/Jun/18 18:03 3.5.1   server   0 3   Hi
I am seeing below error in ZK logs :
"Unexpected exception causing shutdown while sock still open java.io.EOFException"
and then ZK server shuts down with "shutdown Leader! reason: Not sufficient followers synced, only synced with sids: " error.

I am using zookeeper-3.5.1-alpha version.
it is ensemble of 2 servers
Could you please help me resolve this issue
Thanks

config
{code:java}

initLimit=10
syncLimit=5
maxClientCnxns=0
tickTime=2000
quorumListenOnAllIPs=true
dataDir=/var/run/zookeeper/conf/default
admin.enableServer=false
standaloneEnabled=false

{code}

zookeeper server logs
{code:java}

114829 2017-06-22 11:24:18,182 [myid:2147483652] - INFO [ProcessThread(sid:2147483652 cport:-1)::PrepRequestProcessor@649] - Processed session termination for sessionid: 0x40000007cef003d
114830 2017-06-22 11:24:18,182 [myid:2147483652] - INFO [NIOWorkerThread-8:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id2147483652, name1=replica.2147483652,name2=Leader,name3=Connections,name4=128.0.0.5,name5=0x40000007cef003d]
114831 2017-06-22 11:24:18,183 [myid:2147483652] - INFO [NIOWorkerThread-8:NIOServerCnxn@606] - Closed socket connection for client /128.0.0.5:34651 which had sessionid 0x40000007cef003d
114832 2017-06-22 11:24:18,421 [myid:2147483652] - ERROR [LearnerHandler-/128.0.0.5:33610:LearnerHandler@604] - Unexpected exception causing shutdown while sock still open
114833 java.io.EOFException
114834 at java.io.DataInputStream.readInt(DataInputStream.java:403)
114835 at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
114836 at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
114837 at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
114838 at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:515)
114839 2017-06-22 11:24:18,422 [myid:2147483652] - WARN [LearnerHandler-/128.0.0.5:33610:LearnerHandler@619] - ******* GOODBYE /128.0.0.5:33610 ********
114840 2017-06-22 11:24:18,422 [myid:2147483652] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:61808:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 128.0.0.4:42854
114841 2017-06-22 11:24:18,422 [myid:2147483652] - INFO [NIOWorkerThread-4:ZooKeeperServer@969] - Client attempting to renew session 0x40000007cef0001 at /128.0.0.4:42854
114842 2017-06-22 11:24:18,422 [myid:2147483652] - INFO [NIOWorkerThread-4:ZooKeeperServer@678] - Established session 0x40000007cef0001 with negotiated timeout 20000 for client / 128.0.0.4:42854
114843 2017-06-22 11:24:18,423 [myid:2147483652] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:61808:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 128.0.0.4:42862
114844 2017-06-22 11:24:18,423 [myid:2147483652] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:61808:NIOSe

{code}


After some time ..

{code:java}

114851 2017-06-22 11:24:18,423 [myid:2147483652] - INFO [NIOWorkerThread-12:ZooKeeperServer@678] - Established session 0x40000007cef0003 with negotiated timeout 20000 for client / 128.0.0.4:42866
114852 2017-06-22 11:24:19,001 [myid:2147483652] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:61808:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from / 128.0.0.4:42892
114853 2017-06-22 11:24:19,001 [myid:2147483652] - INFO [NIOWorkerThread-13:ZooKeeperServer@964] - Client attempting to establish new session at /128.0.0.4:42892
114854 2017-06-22 11:24:19,211 [myid:2147483652] - INFO [SessionTracker:ZooKeeperServer@384] - Expiring session 0x40000007cef016c, timeout of 20000ms exceeded
114855 2017-06-22 11:24:19,211 [myid:2147483652] - INFO [SessionTracker:QuorumZooKeeperServer@132] - Submitting global closeSession request for session 0x40000007cef016c
114856 2017-06-22 11:24:19,211 [myid:2147483652] - INFO [SessionTracker:ZooKeeperServer@384] - Expiring session 0x40000007cef016d, timeout of 20000ms exceeded
114857 2017-06-22 11:24:19,211 [myid:2147483652] - INFO [ProcessThread(sid:2147483652 cport:-1)::PrepRequestProcessor@649] - Processed session termination for sessionid: 0x40000007cef016c
114858 2017-06-22 11:24:19,211 [myid:2147483652] - INFO [SessionTracker:QuorumZooKeeperServer@132] - Submitting global closeSession request for session 0x40000007cef016d
114859 2017-06-22 11:24:19,211 [myid:2147483652] - INFO [ProcessThread(sid:2147483652 cport:-1)::PrepRequestProcessor@649] - Processed session termination for sessionid: 0x40000007cef016d
114860 2017-06-22 11:24:19,579 [myid:2147483652] - INFO [QuorumPeer[myid=2147483652](plain=/0:0:0:0:0:0:0:0:61808)(secure=disabled):Leader@613] - Shutting down
114861 2017-06-22 11:24:19,579 [myid:2147483652] - INFO [QuorumPeer[myid=2147483652](plain=/0:0:0:0:0:0:0:0:61808)(secure=disabled):Leader@619] - Shutdown called
114862 java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ [2147483652] ]
114863 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:619)
114864 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590)
114865 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1077)
114866 2017-06-22 11:24:19,579 [myid:2147483652] - INFO [QuorumPeer[myid=2147483652](plain=/0:0:0:0:0:0:0:0:61808)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org. apache.ZooKeeperService:name0=ReplicatedServer_id2147483652,name1=replica.2147483652,name2=Leader,name3=Connections,name4=128.0.0.4,name5=0x40000007cef006b]
114867 2017-06-22 11:24:19,579 [myid:2147483652] - INFO [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:61809:Leader$LearnerCnxAcceptor@373] - exception while shutting down acceptor: java.net. SocketException: Socket closed
114868 2017-06-22 11:24:19,581 [myid:2147483652] - INFO [QuorumPeer[myid=2147483652](plain=/0:0:0:0:0:0:0:0:61808)(secure=disabled):NIOServerCnxn@606] - Closed socket connection for client /128.0.0.4:41674 which had sessionid 0x40000007cef006b

{code}
configuration 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 41 weeks, 6 days ago 0|i3hqm7:
ZooKeeper ZOOKEEPER-2847

Cannot bind to client port when reconfig based on old static config

Bug Resolved Major Fixed Yisong Yue Fangmin Lv Fangmin Lv 19/Jul/17 02:14   04/Oct/19 10:55 01/Oct/18 00:11 3.5.3, 3.6.0 3.6.0 server   0 7 0 8400   When started the ensemble with old static config that the server string doesn't have client port, dynamically remove and add the same server from the ensemble will cause that server cannot bind to client port, and the ZooKeeper server cannot serve client requests anymore.

From the code, we'll set the clientAddr to null when start up with old static config, and dynamic config forces to have <client port> part, which will trigger the following rebind code in QuorumPeer#processReconfig, and cause the address already in used issue.

public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, Long zxid, boolean restartLE) {
...
if (myNewQS != null && myNewQS.clientAddr != null
&& !myNewQS.clientAddr.equals(oldClientAddr)) {
cnxnFactory.reconfigure(myNewQS.clientAddr);
updateThreadName();
}
...
}
100% 100% 8400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 24 weeks, 3 days ago 0|i3hozj:
ZooKeeper ZOOKEEPER-2846

Leader follower sync with on disk txns can possibly leads to data inconsistency

Bug Open Critical Unresolved Unassigned Fangmin Lv Fangmin Lv 18/Jul/17 15:34   20/Feb/19 15:15   3.4.10, 3.5.3, 3.6.0   quorum   0 6 0 1800   On disk txn sync could cause data inconsistency if the current leader just had a snap sync before it became leader, and then having diff sync with its followers may synced the txns gap on disk. Here is scenario:

Let's say S0 - S3 are followers, and S4 is leader at the beginning:

1. Stop S2 and send one more request
2. Stop S3 and send more requests to the quorum to let S3 have a snap sync with S4 when it started up
3. Stop S4 and S3 became the new leader
4. Start S2 and had a diff sync with S3, now there are gaps in S2

Attached the test case to verify the issue. Currently, there is no efficient way to check the gap in txn files is a real gap or due to Epoch change. We need to add that support, but before that, it would be safer to disable the on disk txn leader-follower sync.

Another two scenarios which could cause the same issue:

(Scenario 1) Server A, B, C, A is leader, the others are followers:

1). A synced to disk, but the other 2 restarted before receiving the proposal
2). B and C formed quorum, B is leader, and committed some requests
3). A looking again, and sync with B, B won't able to trunc A but send snap instead, and leaves the extra txn in A's txn file
4). A became new leader, and someone else has a diff sync with A it will have the extra txn

(Scenario 2) Diff sync with committed txn, will only apply to data tree but not on disk txn file, which will also leave hole in it and lead to data inconsistency issue when syncing with learners.
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks, 1 day ago 0|i3hoav:
ZooKeeper ZOOKEEPER-2845

Data inconsistency issue due to retain database in leader election

Bug Resolved Critical Fixed Robert Joseph Evans Fangmin Lv Fangmin Lv 14/Jul/17 19:20   13/Sep/18 20:19 23/Feb/18 18:20 3.4.10, 3.5.3, 3.6.0 3.5.4, 3.6.0, 3.4.12 quorum   0 10 0 1200   In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time during leader election. In ZooKeeper ensemble, it's possible that the snapshot is ahead of txn file (due to slow disk on the server, etc), or the txn file is ahead of snapshot due to no commit message being received yet.

If snapshot is ahead of txn file, since the SyncRequestProcessor queue will be drained during shutdown, the snapshot and txn file will keep consistent before leader election happening, so this is not an issue.

But if txn is ahead of snapshot, it's possible that the ensemble will have data inconsistent issue, here is the simplified scenario to show the issue:

Let's say we have a 3 servers in the ensemble, server A and B are followers, and C is leader, and all the snapshot and txn are up to T0:
1. A new request reached to leader C to create Node N, and it's converted to txn T1
2. Txn T1 was synced to disk in C, but just before the proposal reaching out to the followers, A and B restarted, so the T1 didn't exist in A and B
3. A and B formed a new quorum after restart, let's say B is the leader
4. C changed to looking state due to no enough followers, it will sync with leader B with last Zxid T0, which will have an empty diff sync
5. Before C take snapshot it restarted, it replayed the txns on disk which includes T1, now it will have Node N, but A and B doesn't have it.

Also I included the a test case to reproduce this issue consistently.

We have a totally different RetainDB version which will avoid this issue by doing consensus between snapshot and txn files before leader election, will submit for review.
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 27 weeks ago 0|i3hjzr:
ZooKeeper ZOOKEEPER-2844

Zookeeper auto purge process does not purge files

Bug Open Major Unresolved Unassigned Avi Steiner Avi Steiner 13/Jul/17 04:34   06/Oct/18 06:44   3.4.6       2 4   Windows Server 2008 R2 I'm using Zookeeper 3.4.6

The ZK log data folder keeps growing with transaction logs files (log.*).

I set the following in zoo.cfg:
autopurge.purgeInterval=1
autopurge.snapRetainCount=3
dataDir=..\\data

Per ZK log, it reads those parameters:

2017-07-13 10:36:21,266 [myid:] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-07-13 10:36:21,266 [myid:] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 1

It also says that cleanup process is running:

2017-07-13 10:36:21,266 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2017-07-13 10:36:21,297 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.

But actually nothing is deleted.
Every service restart, a new file is created.

The only parameter I managed to change is preAllocSize, which means the minimum size per file. The default is 64MB. I changed it to 10KB only for testing, and I swa the effect as expected: new files were created with 10KB.

I also tried to create a batch file that will run the following:

java -cp zookeeper-3.4.6.jar;lib/slf4j-api-1.6.1.jar;lib/slf4j-log4j12-1.6.1.jar;lib/log4j-1.2.16.jar;conf org.apache.zookeeper.server.PurgeTxnLog .\data -n 3

But it still doesn't do the job.

Please advise.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 23 weeks, 5 days ago 0|i3hh1z:
ZooKeeper ZOOKEEPER-2843

auth_to_local should support reading rules from a file

Improvement Open Major Unresolved Unassigned Lionel Cons Lionel Cons 13/Jul/17 03:15   31/Jan/19 04:44   3.4.10, 3.5.3   kerberos, server   0 5 0 2400   The current handling of {{zookeeper.security.auth_to_local}} in {{KerberosName.java}} only supports rules given directly as property value.

These rules must therefore be given on the command line and:
* must be escaped properly to avoid shell expansion
* are visible in the {{ps}} output

It would be much better to put these rules in a file and pass the file path as the property value. We would then use something like {{-Dzookeeper.security.auth_to_local=file:/etc/zookeeper/rules}}.

Note that using the {{file:}} prefix allows keeping backward compatibility.
100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 7 weeks ago 0|i3hgwf:
ZooKeeper ZOOKEEPER-2842

optimize the finish() of Send/RecvWorker in QuorumCnxManager and remove testInitiateConnection() and formates some codes

Improvement Resolved Trivial Won't Do Unassigned maoling maoling 12/Jul/17 08:02   05/Jun/19 05:46 05/Jun/19 05:46     quorum   0 2 0 1200   1.the finish() of Send/RecvWorker in QuorumCnxManager changes to double-checked lock style [https://en.wikipedia.org/wiki/Double-checked_locking]
,a trivial code changes implement a smaller granularity lock to have a better perfermance in too fierce multithread situation.
2.testInitiateConnection() is redundant test function which is only used in TestCase,so I refactor it.
3.some codes don't abide to Java Programme Specification ,so I lift a finger to format them
100% 100% 1200 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks, 3 days ago 0|i3hfbb:
ZooKeeper ZOOKEEPER-2841

ZooKeeper public include files leak porting changes

Bug Resolved Major Fixed Andrew Schwartzmeyer Andrew Schwartzmeyer Andrew Schwartzmeyer 07/Jul/17 18:48   01/Aug/17 13:34 01/Aug/17 11:48   3.4.11, 3.5.4, 3.6.0 c client   0 4   Windows 10 with Visual Studio 2017 The fundamental problem is that the port of the C client to Windows is now close to six years old, with very few updates. This port leaks a lot of changes that should be internal to ZooKeeper, and many of those changes are simply no longer relevant. The correct thing to do is attempt to refactor the Windows port for new versions of ZooKeeper, removing dead/unneeded porting code, and moving dangerous porting code to C files instead of public headers.

Two primary examples of this problem are [ZOOKEEPER-2491|https://issues.apache.org/jira/browse/ZOOKEEPER-2491] and [MESOS-7541|https://issues.apache.org/jira/browse/MESOS-7541].

The first issue stems from this ancient porting code:
{noformat}
#define snprintf _snprintf
{noformat}
in [winconfig.h|https://github.com/apache/zookeeper/blob/ddf0364903bf7ac7cd25b2e1927f0d9d3c7203c4/src/c/include/winconfig.h#L179]. Newer versions of Windows C libraries define {{snprintf}} as a function, and so it cannot be redefined.

The second issue comes from this undocumented change:

{noformat}
#undef AF_INET6
{noformat}

again in [winconfig.h|https://github.com/apache/zookeeper/blob/ddf0364903bf7ac7cd25b2e1927f0d9d3c7203c4/src/c/include/winconfig.h#L169] which breaks any library that uses IPv6 and {{winsock2.h}}.

Furthermore, the inclusion of the following defines and headers causes terrible problems for consuming libraries, as they leak into ZooKeeper's public headers:

{noformat}
#define _CRT_SECURE_NO_WARNINGS
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <Winsock2.h>
#include <winstdint.h>
#include <process.h>
#include <ws2tcpip.h>
{noformat}

Depending on the order that a project includes or compiles files, this may or may not cause {{WIN32_LEAN_AND_MEAN}} to become unexpectedly defined, and {{windows.h}} to be unexpectedly included. This problem is exacberated by the fact that the {{winsock2.h}} and {{windows.h}} headers are order-dependent (if you read up on this, you'll see that defining {{WIN32_LEAN_AND_MEAN}} was meant to work-around this).

Going forward, porting changes should live next to where they are used, preferably in source files, not header files, so they remain contained.
windows 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 33 weeks, 2 days ago cmake is added to replace the existing hardcoded (and outdated) visual studio solutions for windows platform. 0|i3h9hj:
ZooKeeper ZOOKEEPER-2840

Should using `System.nanoTime() ^ this.hashCode()` for StaticHostProvider

Bug Open Major Unresolved Benedict Jin Benedict Jin Benedict Jin 05/Jul/17 08:17   05/Feb/20 07:17   3.5.3 3.5.8 java client   0 4 259200 259200 0% Should using `System.nanoTime() ^ this.hashCode()` for StaticHostProvider instead of `System.currentTimeMillis()`. Because if we have three Zookeeper server nodes and set the `zookeeper.leaderServes` as `no`, then those connections from client will always connect with the first Zookeeper server node. Due to...

```java
@Test
public void testShuffle() throws Exception {
LinkedList<InetSocketAddress> inetSocketAddressesList = new LinkedList<>();
inetSocketAddressesList.add(new InetSocketAddress(0));
inetSocketAddressesList.add(new InetSocketAddress(1));
inetSocketAddressesList.add(new InetSocketAddress(2));
/*
1442045361
currentTime: 1499253530044, currentTime ^ hashCode: 1500143845389, Result: 1 2 0
currentTime: 1499253530044, currentTime ^ hashCode: 1500143845389, Result: 2 0 1
currentTime: 1499253530045, currentTime ^ hashCode: 1500143845388, Result: 0 1 2
currentTime: 1499253530045, currentTime ^ hashCode: 1500143845388, Result: 1 2 0
currentTime: 1499253530046, currentTime ^ hashCode: 1500143845391, Result: 1 2 0
currentTime: 1499253530046, currentTime ^ hashCode: 1500143845391, Result: 1 2 0
currentTime: 1499253530046, currentTime ^ hashCode: 1500143845391, Result: 1 2 0
currentTime: 1499253530046, currentTime ^ hashCode: 1500143845391, Result: 1 2 0
currentTime: 1499253530047, currentTime ^ hashCode: 1500143845390, Result: 1 2 0
currentTime: 1499253530047, currentTime ^ hashCode: 1500143845390, Result: 1 2 0
*/
internalShuffleMillis(inetSocketAddressesList);
/*
146611050
currentTime: 22618159623770, currentTime ^ hashCode: 22618302559536, Result: 2 1 0
currentTime: 22618159800738, currentTime ^ hashCode: 22618302085832, Result: 0 1 2
currentTime: 22618159967442, currentTime ^ hashCode: 22618302248888, Result: 1 0 2
currentTime: 22618160135080, currentTime ^ hashCode: 22618302013634, Result: 2 1 0
currentTime: 22618160302095, currentTime ^ hashCode: 22618301535077, Result: 2 1 0
currentTime: 22618160490260, currentTime ^ hashCode: 22618301725822, Result: 1 0 2
currentTime: 22618161566373, currentTime ^ hashCode: 22618300303823, Result: 1 0 2
currentTime: 22618161745518, currentTime ^ hashCode: 22618300355844, Result: 2 1 0
currentTime: 22618161910357, currentTime ^ hashCode: 22618291603775, Result: 2 1 0
currentTime: 22618162079549, currentTime ^ hashCode: 22618291387479, Result: 0 1 2
*/
internalShuffleNano(inetSocketAddressesList);

inetSocketAddressesList.clear();
inetSocketAddressesList.add(new InetSocketAddress(0));
inetSocketAddressesList.add(new InetSocketAddress(1));

/*
415138788
currentTime: 1499253530050, currentTime ^ hashCode: 1499124456998, Result: 0 1
currentTime: 1499253530050, currentTime ^ hashCode: 1499124456998, Result: 0 1
currentTime: 1499253530050, currentTime ^ hashCode: 1499124456998, Result: 0 1
currentTime: 1499253530050, currentTime ^ hashCode: 1499124456998, Result: 0 1
currentTime: 1499253530050, currentTime ^ hashCode: 1499124456998, Result: 0 1
currentTime: 1499253530050, currentTime ^ hashCode: 1499124456998, Result: 0 1
currentTime: 1499253530053, currentTime ^ hashCode: 1499124456993, Result: 0 1
currentTime: 1499253530055, currentTime ^ hashCode: 1499124456995, Result: 0 1
currentTime: 1499253530055, currentTime ^ hashCode: 1499124456995, Result: 0 1
currentTime: 1499253530055, currentTime ^ hashCode: 1499124456995, Result: 0 1
*/
internalShuffleMillis(inetSocketAddressesList);
/*
13326370
currentTime: 22618168292396, currentTime ^ hashCode: 22618156149774, Result: 1 0
currentTime: 22618168416181, currentTime ^ hashCode: 22618156535703, Result: 1 0
currentTime: 22618168534056, currentTime ^ hashCode: 22618156432394, Result: 0 1
currentTime: 22618168666548, currentTime ^ hashCode: 22618155774358, Result: 0 1
currentTime: 22618168818946, currentTime ^ hashCode: 22618155623712, Result: 0 1
currentTime: 22618168936821, currentTime ^ hashCode: 22618156011863, Result: 1 0
currentTime: 22618169056251, currentTime ^ hashCode: 22618155893721, Result: 1 0
currentTime: 22618169611103, currentTime ^ hashCode: 22618157370237, Result: 1 0
currentTime: 22618169744528, currentTime ^ hashCode: 22618156713138, Result: 1 0
currentTime: 22618171273170, currentTime ^ hashCode: 22618184562672, Result: 1 0
*/
internalShuffleNano(inetSocketAddressesList);
}

private void internalShuffleMillis(LinkedList<InetSocketAddress> inetSocketAddressesList) throws Exception {
int hashCode = new StaticHostProvider(inetSocketAddressesList).hashCode();
System.out.println(hashCode);
int count = 10;
Random r;
while (count > 0) {
long currentTime = System.currentTimeMillis();
r = new Random(currentTime ^ hashCode);
System.out.print(String.format("currentTime: %s, currentTime ^ hashCode: %s, Result: ",
currentTime, currentTime ^ hashCode));
Collections.shuffle(inetSocketAddressesList, r);
for (InetSocketAddress inetSocketAddress : inetSocketAddressesList) {
System.out.print(String.format("%s ", inetSocketAddress.getPort()));
}
System.out.println();
count--;
}
}

private void internalShuffleNano(LinkedList<InetSocketAddress> inetSocketAddressesList) throws Exception {
int hashCode = new StaticHostProvider(inetSocketAddressesList).hashCode();
System.out.println(hashCode);
int count = 10;
Random r;
while (count > 0) {
long currentTime = System.nanoTime();
r = new Random(currentTime ^ hashCode);
System.out.print(String.format("currentTime: %s, currentTime ^ hashCode: %s, Result: ",
currentTime, currentTime ^ hashCode));
Collections.shuffle(inetSocketAddressesList, r);
for (InetSocketAddress inetSocketAddress : inetSocketAddressesList) {
System.out.print(String.format("%s ", inetSocketAddress.getPort()));
}
System.out.println();
count--;
}
}
```
0% 0% 259200 259200 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks, 1 day ago 0|i3h4lj:
ZooKeeper ZOOKEEPER-2839

Race condition between AcceptThread and SelectorThread may allow connections beyond the max client connection limit

Bug Open Major Unresolved Unassigned Bhupendra Kumar Jain Bhupendra Kumar Jain 05/Jul/17 08:14   05/Jul/17 08:14           0 2   Race condition between AcceptThread and SelectorThread may allow connections beyond the max client connection limit

As per current code in NIOServerCnxnFactory
1. AcceptThread checks for max connection limit , accept the connection and add to acceptedQueue.
2. Later selector thread will poll the accepted connection , adds the new connection to the connection map.

So if too many concurrent connection happening at same time from same client and Selector thread has not yet processed the already accepted connections from acceptedQueue, then AcceptThread will accept more connections beyond the limit as it still gets the less current connection count
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 1 day ago 0|i3h4lb:
ZooKeeper ZOOKEEPER-2838

race-condition when shutting down NIOServerCnxnFactory yields CancelledKeyException

Bug Open Minor Unresolved Unassigned Paul Millar Paul Millar 05/Jul/17 04:18   05/Jul/17 04:18   3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 3.5.3   server   0 2   The problem stems from closing the ServerSocketChannel before stopping the thread(s) working with the corresponding Selector. Closing the ServerSocketChannel will invalidate any SelectionKey objects that have been declared. This is equivalent to calling cancel on the SelectionKey. Therefore, after the ServerSocketChannel's close method is called, it is possible that any thread working with a SelectionKey will experience CancelledKeyException.

I noticed the problem with ZooKeeper v3.4.6, which resulted in the following stack-trace:

{quote}
04 Jul 2017 15:54:15 (zookeeper) [] Ignoring unexpected runtime exception
java.nio.channels.CancelledKeyException: null
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) ~[na:1.8.0_131]
at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) ~[na:1.8.0_131]
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:187) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{quote}

From manually inspecting the source code, I see the problem is present in all currently released versions of ZooKeeper.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 1 day ago 0|i3h3x3:
ZooKeeper ZOOKEEPER-2837

Add a special START_SERVER_JVMFLAGS option only for `start` command to distinguish JVMFLAGS and SERVER_JVMFLAGS

Bug Resolved Major Won't Fix Benedict Jin Benedict Jin Benedict Jin 04/Jul/17 07:55   30/Jan/19 09:51 30/Jan/19 09:51 3.5.3   server   0 3 259200 258600 600 0% Add a special START_SERVER_JVMFLAGS option only for `start` command to distinguish JVMFLAGS and SERVER_JVMFLAGS.

If we use the normal way to add JVM options with `JVMFLAGS` in `conf/java.env`, then it will effect almost all shell scripts under `bin` directory. Even if using `SERVER_JVMFLAGS` will effect some commands like `zkServer.sh status`, include four-letters commands.

For example, if the JVMFLAGS is
```bash
export JVMFLAGS="-Xms3G -Xmx3G -Xmn1G -XX:+AlwaysPreTouch -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:-PrintGCTimeStamps -Xloggc:/home/zookeeper/logs/zookeeper_`date '+%Y%m%d%H%M%S'`.gc -XX:-UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
```
then we will get too many GC log files due to using the `mntr` four-letters command regularly in some monitor situation.
```bash
$ ls ~/logs
zookeeper_20170704175942.gc
zookeeper_20170704180101.gc
zookeeper_20170704180201.gc
zookeeper_20170704180301.gc
zookeeper_20170704180401.gc
...
```
0% 0% 600 258600 259200 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 36 weeks, 3 days ago 0|i3h2rj:
ZooKeeper ZOOKEEPER-2836

QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

Bug Open Critical Unresolved gaoshu Amarjeet Singh Amarjeet Singh 04/Jul/17 07:50   30/Jan/19 11:31   3.4.6   leaderElection, quorum   1 8 0 600   Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
Java Version: jdk64/jdk1.8.0_40
zookeeper version: 3.4.6.2.3.2.0-2950
QuorumCnxManager Listener thread blocks SocketServer on accept but we are getting SocketTimeoutException on our boxes after 49days 17 hours . As per current code there is a 3 times retry and after that it says "_As I'm leaving the listener thread, I won't be able to participate in leader election any longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this state and we restart or add a new node ,it fails to join cluster and logs 'WARN QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open channel to 3 at election address $<hostname>/$<ip>:3888' .


As there is no timeout specified for ServerSocket it should never timeout but there are some already discussed issues where people have seen this issue and added checks for SocketTimeoutException explicitly like https://issues.apache.org/jira/browse/KARAF-3325 .

I think we need to handle SocketTimeoutException on similar lines for zookeeper as well
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 17 weeks ago 0|i3h2qv:
ZooKeeper ZOOKEEPER-2835

Run server with `-XX:+AlwaysPreTouch` jvm flag

Improvement Open Major Unresolved Benedict Jin Benedict Jin Benedict Jin 04/Jul/17 07:35   30/Jan/19 13:43   3.5.3   server   0 2 259200 258000 1200 0% Add `-XX:+AlwaysPreTouch` jvm flag for server, let jvm through demand-zeroed way to allocate memory once in place when the process started 0% 0% 1200 258000 259200 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks, 6 days ago 0|i3h2pr:
ZooKeeper ZOOKEEPER-2834

ZOOKEEPER-2355 ZOOKEEPER-2355 fix for branch-3.4

Sub-task Resolved Major Duplicate Unassigned Michael Han Michael Han 04/Jul/17 01:47   09/Oct/17 14:10 09/Oct/17 14:10 3.4.8, 3.4.9, 3.4.10 3.4.11 quorum, server   0 1   Update the patch in ZOOKEEPER-2355 so it applies to branch-3.4. Resolve both ZOOKEEPER-2355 and this JIRA after merging the patch to branch-3.4. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 23 weeks, 3 days ago 0|i3h22n:
ZooKeeper ZOOKEEPER-2833

Keep the follower transaction up to date after the fix made in ZOOKEEPER-2355.

Improvement Open Major Unresolved Michael Han Michael Han Michael Han 04/Jul/17 01:31   04/Jul/17 01:31   3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 3.5.3   quorum, server   0 2   After the fix did in ZOOKEEPER-2355, the follower's transaction log might not be up to date, because with the fix in this patch, we will never do setlastProcessedZxid during a DIFF sync. For example imagine a case like this:

* Follower has its latest zxid with value a before DIFF SYNC happens.
* Leader send over proposals with zxids value b, c, d.
* Follower received and applied proposals b and c. Before follower had a chance to get hands on d, network partition happens.
* Now partition healed, follower will do a DIFF think again. Because the zk database would not be reloaded from logs (it's already initialized), follower has a skewed view of the world - it thinks it only has tnx a, but in fact it has a, b, and c. So rather asking b, c, and d, the follower could just ask d.

We should also set the zxid extracted from the current proposal packet after each proposal is committed. It is not functional critical and is an optimization because the idempotent nature of applying transactions.

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 2 days ago 0|i3h21b:
ZooKeeper ZOOKEEPER-2832

Data Inconsistency occurs if follower has uncommitted transaction in the log while synchronizing with the leader that has the lower last processed zxid

Bug Open Major Unresolved Unassigned Beom Heyn Kim Beom Heyn Kim 02/Jul/17 15:37   02/Jul/17 15:43   3.4.9 3.4.10 quorum   1 6   Synchronization code may fail to truncate an uncommitted transaction in the follower’s transaction log. Here is a scenario:

Initial condition:
Start the ensemble with three nodes A, B and C with C being the leader
The current epoch is 1
For simplicity of the example, let’s say zxid is a two digit number, with epoch being the first digit
Create two znodes ‘key0’ and ‘key1’ whose value is ‘0’ and ‘1’, respectively
The zxid is 12 -- 11 for creating key0 and 12 for creating key1. (For simplicity of the example, the zxid gets increased only by transactions directly changing the data of znodes.)
All the nodes have seen the change 12 and have persistently logged it
Shut down all

Step 1
Start Node A and B. Epoch becomes 2. Then, a request, setData(key0, 1000), with zxid 21 is issued. The leader B writes it to the log but Node A is shutdown before writing it to the log. Then, the leader B is also shut down. The change 21 is applied only to B but not to A or C.

Step 2
Start Node A and C. Epoch becomes 3. Node A has the higher zxid than Node C (i.e. 20 > 12). So, Node A becomes the leader. Yet, the last processed zxid is 12 for both Node A and C. So, they are in sync already. Node A sends an empty DIFF to Node C. Node C takes a snapshot and creates snapshot.12. Then, A and C are shut down. Now, C has the higher zxid than Node B.

Step 3
Start Node B and C. Epoch becomes 4. Node C has the higher zxid than Node B (i.e. 30 > 21). So, Node C becomes the leader. Node B and C has the different last processed zxid (i.e. 21 vs 12), and the LinkedList object ‘proposals’ is empty. Thus, Node C sends SNAP to Node B. Node B takes a clean snapshot and creates snapshot.12 as the zxid 12 is the last processed zxid of the leader C. (Note the newly created snapshot on B is assigned the lower zxid then the change 21 in the log). Then, the request, setData(key1, 1001), with zxid 41 is issued. Both B and C apply the change 41 into their logs. (Note that now B and C have the same last processed zxid) Then, B and C are shut down.

Step 4
Start Node B and C. Epoch becomes 5. Node B and C use their local log and snapshot files to restore their in-memory data tree. Node B has 1000 as the value of key0, because it’s latest valid snapshot is snapshot.12 and there was a later transaction with zxid 21 in its log. Yet, Node C has 0 as the value of key0, because the change 21 was never written on C. Node C is the leader. Node B and C have the same last processed zxid, i.e. 41. So, they are considered to be in sync already, and Node C sends an empty DIFF to Node B. So, the synchronization completes with the initially restored in-memory data tree on B and C.

Problem
The value of key0 on B is 1000, while the value of the key0 on Node C is 0. The LearnerHandler.run on C at Step 3, never sends TRUNC but just SNAP. So, the change 21 was never truncated on B. Also, at step 4, since B uses the snapshot of the lower zxid to restore its in-memory data tree, the change 21 could get into the data tree. Then, the leader C at the step 4 did not send SNAP, because the change 41 made to both B and C makes the leader C think the B and C are already in sync. Thus, data inconsistency occurs.

The attached test case can deterministically reproduce the bug.
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 37 weeks, 4 days ago 0|i3h007:
ZooKeeper ZOOKEEPER-2831

ZOOKEEPER-2819 Update documents on getConfig when reconfig is disabled.

Sub-task Open Major Unresolved Unassigned Michael Han Michael Han 30/Jun/17 23:38   30/Jun/17 23:38   3.5.3   documentation   0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 5 days ago 0|i3gzd3:
ZooKeeper ZOOKEEPER-2830

Pre-commit tooling improvement: auto close github pull request when a pull request is merged

Improvement Open Minor Unresolved Unassigned Michael Han Michael Han 30/Jun/17 16:51   30/Jun/17 16:51   3.4.10, 3.5.3   scripts   0 1   The git pull request commit flow script (zk-merge-pr.py) has a feature that sometimes (e.g. if skipping "resolve JIRA" step, it happened a couple of times to me), it will not automatically close the pull request once a pull request is merged. An improvement here would be nice, so every merge will follow with the closing of merged pull request, otherwise either the original author of the pull request, or Apache Infra has to close the pull request which is less convenient. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 6 days ago 0|i3gz3b:
ZooKeeper ZOOKEEPER-2829

Interface usability / compatibility improvements through Java annotation.

Improvement Resolved Major Fixed Abraham Fine Michael Han Michael Han 30/Jun/17 13:14   07/Nov/17 10:28 24/Aug/17 14:57 3.4.10, 3.5.3 3.4.11, 3.5.4, 3.6.0 java client, server   0 3   Hadoop has interface classification regarding the interfaces' scope and stability. ZK should do something similar, which not only provides additional benefits of making API compatibility easier between releases (or even commits, by automating the checks via some tooling), but also consistent with rest of Hadoop ecosystem.

See HADOOP-5073 for more context.
annotation 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 33 weeks, 6 days ago 0|i3gyon:
ZooKeeper ZOOKEEPER-2828

ZOOKEEPER-2819 Test case improvement

Sub-task Open Major Unresolved Unassigned Michael Han Michael Han 30/Jun/17 12:47   01/Jul/17 23:34       leaderElection, quorum, server   0 1   For ZOOKEEPER-2819:
1. Verify that configs are not transferred between peers during leader election phase.
2. Verify that when follower gets a SNAP from leader, the config zNode still has local config instead of using the config deserialized from snapshot.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 6 days ago 0|i3gymf:
ZooKeeper ZOOKEEPER-2827

Code refactoring for `JUTE` module

Improvement Open Minor Unresolved Benedict Jin Benedict Jin Benedict Jin 29/Jun/17 00:07   05/Feb/20 07:17   3.5.3 3.7.0, 3.5.8 jute   0 2 259200 258600 600 0% * Fix spell issues
* Simplify `return` clause
* Using enhanced `for` loop
* Using `try` clause to release the resource of stream
* Remove unnecessary `new Class[]{...}` boxing
* Remove unnecessary `return` clause
* Remove unnecessary `import`
0% 0% 600 258600 259200 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 38 weeks ago 0|i3gvlb:
ZooKeeper ZOOKEEPER-2826

Code refactoring for `CLI` module

Improvement Closed Minor Fixed Benedict Jin Benedict Jin Benedict Jin 28/Jun/17 22:25   20/May/19 13:51 31/Jan/19 09:55 3.5.3 3.6.0, 3.5.5 java client   0 3 259200 255600 3600 1% * Fix spell issues
* Remove unnecessary `import`
* Make initialization block that related with `options.addOption` into static
* Standardize `StringBuilder#append` usage
* Using `try` clause to release the resource of stream
1% 1% 3600 255600 259200 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks ago 0|i3gvi7:
ZooKeeper ZOOKEEPER-2825

1. Remove unnecessary import; 2. `contains` instead of `indexOf > -1` for more readable; 3. Standardize `StringBuilder#append` usage for CLIENT module

Improvement Closed Minor Fixed Benedict Jin Benedict Jin Benedict Jin 28/Jun/17 21:24   20/May/19 13:50 31/Jan/19 09:51 3.5.3 3.6.0, 3.5.5 java client   0 4 259200 255600 3600 1% * Remove unncessnary import;
* `contains` instead of `indexOf > -1` for more readable;
* Standardize `StringBuilder#append` usage for CLIENT module
1% 1% 3600 255600 259200 pull-request-available, refactoring 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks ago 0|i3gvhb:
ZooKeeper ZOOKEEPER-2824

`FileChannel#size` info should be added to `FileTxnLog#commit` to solve the confuse that reason is too large log or too busy disk I/O

Improvement Resolved Minor Fixed Benedict Jin Benedict Jin Benedict Jin 28/Jun/17 05:51   01/Feb/18 20:04 01/Feb/18 18:27 3.5.3 3.5.4, 3.6.0 server   0 3 86400 86400 0% `FileChannel#size` info should be added to `FileTxnLog#commit` to solve the confuse that reason is too large log or too busy disk I/O 0% 0% 86400 86400 logging 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 6 weeks, 6 days ago 0|i3gtwf:
ZooKeeper ZOOKEEPER-2823

1. Fix spell issues; 2. Standardize `StringBuilder#append` usage; 3. Using `try` clause for releasing I/O stream for `COMMON` module

Improvement Open Minor Unresolved Benedict Jin Benedict Jin Benedict Jin 27/Jun/17 22:00   05/Feb/20 07:15   3.5.3 3.7.0, 3.5.8 server   0 2 259200 258600 600 0% * Fix spell issues
* Standardize `StringBuilder#append` usage
* Using `try` clause for releasing I/O stream for `COMMON` module
0% 0% 600 258600 259200 pull-request-available, refactoring 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 38 weeks, 1 day ago 0|i3gten:
ZooKeeper ZOOKEEPER-2822

Wrong `ObjectName` about `MBeanServer` in JMX module

Bug Closed Minor Fixed Benedict Jin Benedict Jin Benedict Jin 27/Jun/17 05:40   20/May/19 13:51 27/Nov/18 03:59 3.5.3 3.6.0, 3.5.5 jmx   0 4 86400 75000 11400 13% The wrong `ObjectName` about `MBeanServer` in JMX module, should `log4j:hierarchy=default` rather than `log4j:hiearchy=default`. 13% 13% 11400 75000 86400 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 16 weeks, 2 days ago 0|i3gryv:
ZooKeeper ZOOKEEPER-2821

1. Fix spell issues; 2. Remove unnecessary boxing / unboxing; 3. Simplify `return` clause; 4. Remove `final` qualifier from `private` method

Improvement Open Minor Unresolved Benedict Jin Benedict Jin Benedict Jin 27/Jun/17 01:21   05/Feb/20 07:17   3.5.3 3.7.0, 3.5.8 security   0 2 259200 258600 600 0% * Fix spell issues
* Remove unnecessary boxing / unboxing
* Simplify `return` clause
* Remove `final` qualifier from `private` method
0% 0% 600 258600 259200 pull-request-available, refactoring 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 6 days ago 0|i3grk7:
ZooKeeper ZOOKEEPER-2820

ZOOKEEPER-2819 Update documentation on how to do rolling restart

Sub-task Open Major Unresolved Michael Han Michael Han Michael Han 26/Jun/17 19:26   26/Jun/17 19:26   3.5.3   documentation   0 2   We should document how to do rolling restart with the presence of dynamic reconfig feature as user might need rolling restart when the quorum can't be formed, to remove bad nodes so a quorum can be formed again. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 38 weeks, 3 days ago 0|i3grbr:
ZooKeeper ZOOKEEPER-2819

Changing membership configuration via rolling restart does not work on 3.5.x.

Bug Resolved Critical Fixed Michael Han Michael Han Michael Han 23/Jun/17 12:06   06/Jul/17 13:38 06/Jul/17 12:49 3.5.0, 3.5.1, 3.5.2, 3.5.3 3.5.4, 3.6.0 quorum, server   0 5   ZOOKEEPER-2820, ZOOKEEPER-2828, ZOOKEEPER-2831 In 3.5.x there is no easy way of changing the membership config using rolling restarts because of the introduction of dynamic reconfig feature in ZOOKEEPER-107, which automatically manages membership configuration parameters.

ZOOKEEPER-2014 introduced a reconfigEnabled flag to turn on / off the reconfig feature. We can use same flag and when it sets to false, it should disable both in memory and on disk updates of membership configuration information, besides disabling the reconfig commands on CLI which ZOOKEEPER-2014 already did, so users can continue using rolling restarts if needed.

We should also document explicitly the support of membership changes via rolling restarts will be deprecated at what release time frame and promote reconfig as the replacement.

The problem was raised at user mailing list by Guillermo Vega-Toro, reference thread:
http://zookeeper-user.578899.n2.nabble.com/How-to-add-nodes-to-a-Zookeeper-3-5-3-beta-ensemble-with-reconfigEnabled-false-td7583138.html
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks ago 0|i3go8n:
ZooKeeper ZOOKEEPER-2818

Improve the ZooKeeper#setACL java doc

Bug Resolved Major Fixed Brahma Reddy Battula Brahma Reddy Battula Brahma Reddy Battula 22/Jun/17 23:41   05/Jul/17 04:42 04/Jul/17 01:56   3.4.11, 3.5.4, 3.6.0     0 5   As per discussion in [mailinglist|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201706.mbox/browser],It's better improve Java doc or argument which might not mislead for new users.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 1 day ago 0|i3gn5j:
ZooKeeper ZOOKEEPER-2817

Using `Collections.singletonList` instead of `Arrays.asList(oneElement)`

Improvement Open Minor Unresolved Benedict Jin Benedict Jin Benedict Jin 22/Jun/17 07:32   05/Feb/20 07:16   3.5.3 3.7.0, 3.5.8 server   1 2 259200 258600 600 0% Using `Collections.singletonList` instead of `Arrays.asList(oneElement)` for reusing a immutable object instead of creating a new object. 0% 0% 600 258600 259200 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks, 6 days ago 0|i3glqf:
ZooKeeper ZOOKEEPER-2816

Code refactoring for `ZK_SERVER` module

Improvement Resolved Major Fixed Benedict Jin Benedict Jin Benedict Jin 21/Jun/17 21:23   25/Jun/17 19:26 25/Jun/17 18:16 3.5.3 3.5.4, 3.6.0 server   0 4   * Fix spell issues
* Merge exceptions with `|` character
* Remove unnecessary boxing
* Remove unused import
* Using enhanced `for` loop
* Using `LinkedList` for removing duplicates ACL
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 38 weeks, 4 days ago 0|i3gkun:
ZooKeeper ZOOKEEPER-2815

1. Using try clause to close resource; 2. Others code refactoring for PERSISTENCE module

Improvement Resolved Major Fixed Benedict Jin Benedict Jin Benedict Jin 21/Jun/17 21:21   25/Jun/17 21:14 25/Jun/17 18:07 3.5.3 3.5.4, 3.6.0 server   0 4   * Using try clause to close resource;
* Others code refactoring for PERSISTENCE module.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 38 weeks, 3 days ago 0|i3gkuf:
ZooKeeper ZOOKEEPER-2814

Ignore space after comma in connection string

Bug Open Minor Unresolved Nikhil Bhide Viliam Durina Viliam Durina 21/Jun/17 05:00   30/Jan/19 11:23   3.5.3       0 7 0 600   I'm using the following connection string:

{{10.0.0.179:2181,<space>10.0.0.176:2181}}

However, I get:

{{java.net.UnknownHostException: 10.0.0.176: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at ...
...}}

The problem was the space after the comma. I suggest to either make the space optional or produce error on it, as this is a real pain to spot. Using the space also makes the connect string more readable. Spaces are not allowed in domain names.
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 22 weeks, 6 days ago 0|i3gjan:
ZooKeeper ZOOKEEPER-2813

Failure tight loop in acceptor

Bug Open Minor Unresolved Unassigned Paul Millar Paul Millar 19/Jun/17 04:54   19/Jun/17 04:54   3.4.8 3.5.0 server   0 3   A failure during accepting an incoming connection results in the acceptor thread being caught in a tight-loop. For example:

{noformat}
13 Jun 2017 15:31:39 (zookeeper) [] Ignoring unexpected runtime exception
java.lang.NullPointerException: null
at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:864) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:418) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:198) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:244) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) ~[zookeeper-3.4.8.jar:3.4.8--1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
13 Jun 2017 15:31:39 (zookeeper) [] Ignoring unexpected runtime exception
java.lang.NullPointerException: null
at org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:569) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:902) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:418) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:198) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:244) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) ~[zookeeper-3.4.8.jar:3.4.8--1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
13 Jun 2017 15:31:40 (zookeeper) [] Ignoring unexpected runtime exception
java.lang.NullPointerException: null
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:185) ~[zookeeper-3.4.8.jar:3.4.8--1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
13 Jun 2017 15:31:40 (zookeeper) [] Ignoring unexpected runtime exception
java.lang.NullPointerException: null
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:185) ~[zookeeper-3.4.8.jar:3.4.8--1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
13 Jun 2017 15:31:40 (zookeeper) [] Ignoring unexpected runtime exception
java.lang.NullPointerException: null
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:185) ~[zookeeper-3.4.8.jar:3.4.8--1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{noformat}

The first stack-trace is due to ZOOKEEPER-2810, the second is due to ZOOKEEPER-2812.

The other stack-traces (NPE from NIOServerCnxnFactory.java:185) are never-ending, as the service has been caught in a tight-loop.

The reason is that the NIOServerCnxnFactory class fails to guarantee that `selected` variable is clearer, so the SelectionKey that triggered the bugs remains "live". However, since there are no incoming connections, the call to `accept()` returns null, triggering the NPE.

It appears this problem is fixed with 3.5.0 (with commit 6302d7a7). If back-porting this patch is too invasive, another solution might be to place the `selected.clear()` statement inside the finally-clause of the try-statement.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 39 weeks, 3 days ago 0|i3gfjj:
ZooKeeper ZOOKEEPER-2812

Racy implicit SessionTracker creation

Bug Open Minor Unresolved Unassigned Paul Millar Paul Millar 19/Jun/17 03:51   19/Jun/17 03:51   3.4.8   server   0 2   As with ZOOKEEPER-2810, NIOServerCnxnFactory#startup current starts the acceptor thread before initialising the ZooKeeperServer object. This leads to a race-condition between any incoming connection and the thread initialising the ZooKeeperServer.

If the incoming connection wins the race then the thread processing this connection will see an uninitialised SessionTracker object, resulting in the following NPE being thrown:

{noformat}
java.lang.NullPointerException: null
at org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:569) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:902) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:418) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:198) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:244) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) ~[zookeeper-3.4.8.jar:3.4.8--1]
{noformat}

Again, as with ZOOKEEPER-2810, the naive fix (starting the acceptor thread last in NIOServerCnxnFactory#startup method) may fix this issue.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 39 weeks, 3 days ago 0|i3gffb:
ZooKeeper ZOOKEEPER-2811

PurgeTxnLog#validateAndGetFile: return tag has no arguments.

Bug Resolved Minor Fixed Michael Han Michael Han Michael Han 18/Jun/17 18:36   12/Jul/17 22:59 06/Jul/17 12:53 3.4.10 3.4.11 documentation   0 3   The Java doc of PurgeTxnLog#validateAndGetFile is missing the value of its return tag, which causes -1 in the JavaDoc category of pre-commit build:

{noformat}
[javadoc] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/src/java/main/org/apache/zookeeper/server/PurgeTxnLog.java:214: warning - @return tag has no arguments.
{noformat}
newbie 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 37 weeks ago 0|i3gf33:
ZooKeeper ZOOKEEPER-2810

Racy implicit ZKDatabase creation

Bug Open Minor Unresolved Unassigned Paul Millar Paul Millar 16/Jun/17 09:24   16/Jun/17 09:24   3.4.8   server   0 2   The NIOServerCnxnFactory#startup method first starts the acceptor thread and then initialises the ZooKeeperServer instance. In particular, the call to ZooKeeperServer#startdata method creates the ZKDatabase if it does not already exist.

This creates a race-condition: if the acceptor thread accepts an incoming connection before the ZKDatabase is established then there is a NullPointerException:

{noformat}
java.lang.NullPointerException: null
at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:864) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:418) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:198) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:244) ~[zookeeper-3.4.8.jar:3.4.8--1]
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) ~[zookeeper-3.4.8.jar:3.4.8--1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{noformat}

The same problem appears to be present in release-3.5 and master branches.

The naive fix would be to start the acceptor thread last in NIOServerCnxnFactory#startup, but I can't say whether this would cause any other problems.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 39 weeks, 6 days ago 0|i3gdfr:
ZooKeeper ZOOKEEPER-2809

Unnecessary stack-trace in server when the client disconnect unexpectedly

Bug Resolved Minor Fixed Mark Fenes Paul Millar Paul Millar 16/Jun/17 07:07   04/Oct/17 17:51 11/Sep/17 18:08 3.4.8 3.4.11 server   0 5   In ZK 3.4.x, if the client disconnects unexpectedly then the server logs this with a stack-trace (see src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java:356).

This is unfortunate as we are using an embedded ZK server in our project (in a test environment) and we consider all stack-traces as bugs.

I noticed that ZK 3.5 and later no longer log a stack-trace. This change is due to commit 6206b495 (in branch-3.5), which adds ZOOKEEPER-1504 and seems to fix this issue almost as a side-effect; a similar change in master has the same effect.

I was wondering if the change in how EndOfStreamException is logged (i.e., logging the message without a stack-trace) could be back-ported to 3.4 branch, so could be included in the next 3.4 release.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 24 weeks, 1 day ago 0|i3gdcn:
ZooKeeper ZOOKEEPER-2808

ACL with index 1 might be removed if it's only being used once

Bug Resolved Critical Fixed Fangmin Lv Fangmin Lv Fangmin Lv 15/Jun/17 13:46   19/Jun/17 00:02 18/Jun/17 13:23 3.6.0 3.5.4, 3.6.0 server   0 6   When Zeus start up, it will create DataTree instance, in which the empty config znode is created with READ_UNSAFE acl, the acl will be stored in a map with index 1. Then it's going to load the snapshot from disk, the nodes and acl map will be cleared, but the reconfig znode is still reference to acl index 1. The reconfig znode will be reused, so actually it may reference to a different ACL stored in the snasphot. After leader-follower syncing, the reconfig znode will be added back again (if it doesn't exist), which will remove the previous reference to ACL index 1, if the index 1 has 0 reference it will be removed from the ACL map, which could cause that ACL un-usable, and that znode will not be readable.

Error logs related:
-----------------------------
2017-06-12 12:02:21,443 [myid:2] - ERROR [CommitProcWorkThread-14:DataTree@249] - ERROR: ACL not available for long 1
2017-06-12 12:02:21,444 [myid:2] - ERROR [CommitProcWorkThread-14:FinalRequestProcessor@567] - Failed to process sessionid:0x201035cc882002d type:getChildren cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
java.lang.RuntimeException: Failed to fetch acls for 1
at org.apache.zookeeper.server.DataTree.convertLong(DataTree.java:250)
at org.apache.zookeeper.server.DataTree.getACL(DataTree.java:799)
at org.apache.zookeeper.server.ZKDatabase.getACL(ZKDatabase.java:574)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:463)
at org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:439)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
upgrade 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 39 weeks, 4 days ago 0|i3gbwf:
ZooKeeper ZOOKEEPER-2807

ZOOKEEPER-3170 Flaky test: org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged

Sub-task Resolved Major Fixed Andor Molnar Abraham Fine Abraham Fine 13/Jun/17 17:56   31/Jan/19 10:05 05/Nov/18 02:50         0 6 0 19800   100% 100% 19800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 19 weeks, 3 days ago 0|i3g8dr:
ZooKeeper ZOOKEEPER-2806

Flaky test: org.apache.zookeeper.server.quorum.FLEBackwardElectionRoundTest.testBackwardElectionRound

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 13/Jun/17 14:42   26/Mar/18 14:27 26/Mar/18 14:27 3.4.10, 3.5.3, 3.6.0 3.5.4, 3.6.0, 3.4.12     0 6   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 2 days ago 0|i3g87r:
ZooKeeper ZOOKEEPER-2805

NullPointerException when using no merge merge policy and too many disk components

Bug Resolved Major Invalid Unassigned Abdullah Alamoudi Abdullah Alamoudi 09/Jun/17 12:50   12/Jun/17 19:29 12/Jun/17 19:29         0 2   Stacktrace not available but it is on close call of the LSMBtreeSearchCursor.close() call 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 40 weeks, 3 days ago 0|i3g3lj:
ZooKeeper ZOOKEEPER-2804

Node creation fails with NPE if ACLs are null

Bug Resolved Major Fixed Bhupendra Kumar Jain Bhupendra Kumar Jain Bhupendra Kumar Jain 09/Jun/17 06:20   18/Aug/17 19:41 18/Aug/17 17:39   3.6.0     0 5   If null ACLs are passed then zk node creation or set ACL fails with NPE
{code}
java.lang.NullPointerException
at org.apache.zookeeper.server.PrepRequestProcessor.removeDuplicates(PrepRequestProcessor.java:1301)
at org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:1341)
at org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:519)
at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:1126)
at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:178)
{code}

Expected to handle null in server and return proper error code to client
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 30 weeks, 6 days ago 0|i3g2zb:
ZooKeeper ZOOKEEPER-2803

Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

Bug Resolved Major Not A Problem Abraham Fine Abraham Fine Abraham Fine 08/Jun/17 18:08   09/Jun/17 17:33 09/Jun/17 17:23 3.4.10       0 2   Please ignore. I tested against the wrong version of ZooKeeper and this was resolved by ZOOKEEPER-1653

-We have noticed on internal executions of the integration tests rare failures of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.-

{code}
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Caused by: java.io.IOException: The current epoch, 0, is older than the last zxid, 4294967296
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
{code}

-along with this strange stack trace in the logs:-
{code}
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
at org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
{code}

-It appears that this failure is related to the usage of {{((FileOutputStream) out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. {{FileChannel#force}} appears to be interruptible, which is not desirable behavior when writing the epoch file. The interrupt may be triggered by the repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not appear to have the same problem.-

-I was able to find another JIRA ticket describing a similar issue here:' https://issues.apache.org/jira/browse/DERBY-4963-

-There is also interesting discussion in ZOOKEEPER-1835 (where the change was made for 3.5) although these discussions appear to be Windows centric (we noticed the issue on Linux)-https://issues.apache.org/jira/browse/ZOOKEEPER-1835

-The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build #3241" but jenkins cleared out the logs (I only still have the test report from the mailing list).-

-In addition, {{testWorkerThreads}} appears to be failing every few months on Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430 and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote this Jenkins had cleaned out the logs from the latest failed run so I have no way of determining if the cause is the same.-
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 41 weeks ago 0|i3g273:
ZooKeeper ZOOKEEPER-2802

Zookeeper C client hang @wait_sync_completion

Bug Open Critical Unresolved Unassigned yihao yang yihao yang 07/Jun/17 19:05   03/Sep/17 20:40   3.4.6   c client   1 4   DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS" I was using zookeeper 3.4.6 c client to access one zookeeper server in a VM. The VM environment is not stable and I get a lot of EXPIRED_SESSION_STATE events. I will create another session to ZK when I get an expired event. I also have a read/write lock to protect session read (get/list/... on zk) and write(connect, close, reconnect zhandle).
The problem is the session got an EXPIRED_SESSION_STATE event and when it tried to hold the write lock and reconnect the session, it found there is a thread was holding the read lock (which was operating sync list on zk). See the stack below:

GDBStack:
Thread 7 (Thread 0x7f838a43a700 (LWP 62845)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x0000000000636033 in wait_sync_completion (sc=sc@entry=0x7f8344000af0) at src/mt_adaptor.c:85
#2 0x0000000000633248 in zoo_wget_children2_ (zh=<optimized out>, path=0x7f83440677a8 "/dict/objects/__services/RLS-GSE/_static_nodes", watcher=0x0, watcherCtx=0x13e6310, strings=0x7f838a4397b0, stat=0x7f838a4398d0) at src/zookeeper.c:3630
#3 0x000000000045e6ff in ZooKeeperContext::getChildren (this=0x13e6310, path=..., children=children@entry=0x7f838a439890, stat=stat@entry=0x7f838a4398d0) at zookeeper_context.cpp:xxx

This sync list didn't return a ZINVALIDSTAT but hung. Anyone know the problem?
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 28 weeks, 3 days ago 0|i3g0fj:
ZooKeeper ZOOKEEPER-2801

address spelling errors/typos

Improvement Open Trivial Unresolved tony mancill tony mancill tony mancill 06/Jun/17 01:25   30/Jan/19 09:35   3.5.3       0 2 0 600   This is a follow-on for ZOOKEEPER-2617 (for which I only supplied a patch for branch-3.4), that addresses minor typos in master. With a slight modification, the patch also applies against the branch-3.5 branch.

If folks are curious, the typos are spotted with the "spellintian" shipped with Debian's lintian package.
100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
2 years, 40 weeks, 6 days ago 0|i3fwpb:
ZooKeeper ZOOKEEPER-2800

zookeeper ephemeral node not deleted after server restart and consistency is not hold

Bug Open Critical Unresolved Unassigned Jiafu Jiang Jiafu Jiang 06/Jun/17 00:39   06/Nov/19 01:31   3.4.11   quorum 06/Jun/17 0 5   Centos6.5 java8 I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:30.0.0.72
ofs_zk2:30.0.0.73
ofs_zk3:30.0.0.99

On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,:
/adm_election/rolemgr/rolemgr0000000008,
/adm_election/rolemgr/rolemgr0000000011,
/adm_election/rolemgr/rolemgr0000000012,

with sesstion timeout 20000 ms.

Then I restart ofs_zk1 and ofs_zk2.


On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1.
I can check the nodes by zkCli.sh get command on ofs_zk1.
But these nodes doesn't not exist on ofs_zk2 and ofs_zk3.
Is it odd?


I have upload the whole deploy directory of three nodes to:
https://pan.baidu.com/s/1miohiCo ,
The log is printed in log/zookeeper.out

log of ofs_zk3 is too large, so I only show the head 1000 lines.

Since I find this PR a little late, some snapshot and log may be deleted.
I hope anyone can help find the reason.
9223372036854775807 No Perforce job exists for this issue. 4 9223372036854775807
19 weeks, 1 day ago 0|i3fwmv:
ZooKeeper ZOOKEEPER-2799

Separate logs generated by tests in jenkins jobs when tests are run in parallel

Bug Open Major Unresolved Abraham Fine Abraham Fine Abraham Fine 02/Jun/17 16:24   06/Jun/17 15:59           0 2   We often run our tests in parallel (for example, our GitHub hook), this means the tests are often difficult to debug since all of the tests are logging to the console and become interwoven. We should have each test log to its own file. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 41 weeks, 2 days ago 0|i3ftwn:
ZooKeeper ZOOKEEPER-2798

Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 01/Jun/17 15:06   08/Jun/17 13:26 08/Jun/17 11:54 3.4.10, 3.5.3 3.4.11, 3.5.4, 3.6.0     0 6   This test appears to be failing intermitently on both 3.4 and 3.5. Here are a couple of example failing jobs.

3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/

3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 41 weeks ago 0|i3frpb:
ZooKeeper ZOOKEEPER-2797

Invalid TTL from misbehaving client nukes zookeeper

Bug Resolved Major Fixed Patrick White Patrick White Patrick White 30/May/17 19:32   21/Jan/19 09:54 31/May/17 23:46 3.6.0 3.5.4, 3.6.0 security, server   0 6   I was adding container and TTL support to kazoo, and managed to screw something up which set the TTL to a negative value. This invalid TTL blew up the commit processor, and got written to the log, preventing the zookeepers from starting back up. ttl_nodes 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 42 weeks ago 0|i3fo0v:
ZooKeeper ZOOKEEPER-2796

Test org.apache.zookeeper.ZooKeeperTest.testCreateNodeWithoutData is broken by ZOOKEEPER-2757

Test Resolved Minor Fixed Michael Han Michael Han Michael Han 30/May/17 12:14   30/May/17 15:40 30/May/17 13:44   3.5.4, 3.6.0 tests   0 3   ZOOKEEPER-2757 failed one test which causes recent daily builds failed.

{noformat}
FAILED: org.apache.zookeeper.ZooKeeperTest.testCreateNodeWithoutData

Error Message:
Path must start with / character

Stack Trace:
org.apache.zookeeper.cli.MalformedPathException: Path must start with / character
at org.apache.zookeeper.cli.CreateCommand.exec(CreateCommand.java:122)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:655)
at org.apache.zookeeper.ZooKeeperTest.testCreateNodeWithoutData(ZooKeeperTest.java:293)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
{noformat}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 42 weeks, 2 days ago 0|i3fn47:
ZooKeeper ZOOKEEPER-2795

Change log level for "ZKShutdownHandler is not registered" error message

Wish Resolved Trivial Fixed Abraham Fine Andy Chambers Andy Chambers 30/May/17 11:32   07/May/18 03:57 21/Nov/17 12:39 3.4.10, 3.5.3, 3.6.0 3.5.4, 3.6.0, 3.4.12     0 4   We have an integration test suite that starts up an embedded version of zookeeper as part of a suite of services.

However because it doesn't register a shutdown handler, we get lots of warnings that look like this

17-05-30 15:04:56 achambers.local ERROR [org.apache.zookeeper.server.ZooKeeperServer:472] - ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes

My java is a bit rusty but I think I can't create one of these shutdown handlers from outside the "org.apache.zookeeper.server" package because the constructor has not been marked "public". Would it be possible to do so?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 17 weeks, 2 days ago 0|i3fn1z:
ZooKeeper ZOOKEEPER-2794

ZOOKEEPER-2639 [QP MutualAuth]: Revisit auth logic to handle dynamically add/remove servers

Sub-task Open Major Unresolved Unassigned Rakesh Radhakrishnan Rakesh Radhakrishnan 29/May/17 05:40   05/Feb/20 07:17     3.7.0, 3.5.8 quorum, security   0 3   This jira is to revisit the basic authn/authz logic to handle the dynamically joining server to an ensemble. 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 42 weeks, 3 days ago 0|i3fl4v:
ZooKeeper ZOOKEEPER-2793

ZOOKEEPER-2639 [QP MutualAuth]: Implement a mechanism to build "authzHosts" for dynamic reconfig servers

Sub-task Open Major Unresolved Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 29/May/17 05:37   05/Feb/20 07:16     3.7.0, 3.5.8 quorum, security   0 4   {{QuorumServer}} will do the authorization checks against configured authorized hosts. During LE, QuorumLearner will send an authentication packet to QuorumServer. Now, QuorumServer will check that the connecting QuorumLearner’s hostname exists in the authorized hosts. If not exists then connecting peer is not authorized to join this ensemble and the request will be rejected immediately.

In {{branch-3.4}} building {{authzHosts}} list is pretty straight forward, can use the ensemble server details in zoo.cfg file. But with dynamic reconfig, it has to consider the dynamic add/remove/update servers and need to discuss the ways to handle dynamic cases.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 15 weeks, 2 days ago 0|i3fl3z:
ZooKeeper ZOOKEEPER-2792

ZOOKEEPER-2639 [QP MutualAuth]: Port ZOOKEEPER-1045 implementation from branch-3.4 to branch-3.5

Sub-task Resolved Major Fixed Michael Han Rakesh Radhakrishnan Rakesh Radhakrishnan 29/May/17 05:27   23/Aug/17 00:13 24/Jul/17 13:40   3.5.4 quorum, security   0 5   This jira is to merge the basic working patch covering the authentication and authorization of static(zoo.cfg) ZooKeeper servers from {{branch-3.4}} code base. 9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 30 weeks, 1 day ago 0|i3fl3j:
ZooKeeper ZOOKEEPER-2791

Quorum doesn't recover after zxid rollover

Bug Open Major Unresolved Abraham Fine Mike Heffner Mike Heffner 24/May/17 11:03   26/May/17 14:41   3.3.6, 3.4.8   leaderElection, quorum   1 5   Ubuntu 14.04.4 LTS, AWS EC2, 5 node ensembles When zxid rolls over the ensemble is unable to recover without manually restarting the cluster. The leader enters shutdown() state when zxid rolls over, but the remaining four nodes in the ensemble are not able to re-elect a new leader. This state has persisted for at least 15 minutes before an operator manually restarted the cluster and the ensemble recovered.

Config:
--------
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/raid0/zookeeper
clientPort=2181
maxClientCnxns=100
autopurge.snapRetainCount=14
autopurge.purgeInterval=24
leaderServes: True
server.7=172.26.134.88:2888:3888
server.6=172.26.136.143:2888:3888
server.5=172.26.135.103:2888:3888
server.4=172.26.134.16:2888:3888
server.9=172.26.135.19:2888:3888

Logs:

https://gist.github.com/mheffner/d615d358d4a360ae56a0d0a280040640
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 43 weeks ago 0|i3fewv:
ZooKeeper ZOOKEEPER-2790

Should we consider using `LongAdder` instead of `AtomicLong`

Improvement Open Major Unresolved Unassigned Benedict Jin Benedict Jin 24/May/17 05:23   31/May/17 00:13   3.5.3 4.0.0 server   0 3   ```java
// -Xmx512M -Xms512M -Xmn256M -XX:+AlwaysPreTouch -ea
@Test
public void pressureLongAdder() throws Exception {
final LongAdder longAdder = new LongAdder();
ExecutorService executorService = Executors.newCachedThreadPool();
long startTime = System.currentTimeMillis();
for (int i = 0; i < 100; i++) {
executorService.submit(new Thread(() -> {
for (int j = 0; j < 1000_0000; j++) {
longAdder.increment();
}
System.out.print(String.format("%s %s \t", Thread.currentThread().getId(), longAdder.longValue()));
/*
14 19607585 12 36445036 20 38985288 38 76821270 70 117094732 18 127252576
22 137043349 26 153411172 30 164051380 34 165971155 102 192241678 134 201104979
158 232657818 46 279030056 174 288502545 94 347965290 198 348060553 118 348087414
36 353092712 28 357762215 44 365464475 126 379518198 54 379623515 182 380077075
142 385263911 78 389013887 62 389085727 110 389122678 86 389920423 166 393535019
150 396382512 190 403100499 32 403161217 208 403197689 206 406065520 16 410725026
24 415347205 40 415379997 48 415733397 104 418507295 192 423244160 176 455793362
168 458311865 160 463028656 136 496375440 72 541243645 186 561877000 170 575352229
162 584152392 154 604552121 138 614092854 64 638151890 114 668705836 58 669235250
188 699213410 156 729222401 124 754336889 100 784326386 76 813479501 120 827569944
66 830236567 98 832153503 112 841408676 204 849520891 210 852391130 202 864804732
172 875603834 194 877222893 200 881090909 88 882809513 80 882846368 56 887174571
178 889682247 140 901357028 146 902169049 184 904540678 152 915608988 130 917896629
116 924616135 144 927674541 122 930399321 128 939791111 106 942656234 84 950848174
96 951904067 90 954910184 74 964338213 196 966487766 82 968307139 52 975854400
180 977385398 164 978882525 50 980896807 148 988292352 132 989090669 108 996891232
92 996921398 42 996938988 68 996953941 60 1000000000
*/
}));
}
executorService.shutdown();
while (!executorService.isTerminated()) {
Thread.sleep(1);
}
long endTime = System.currentTimeMillis();
System.out.println("\n" + (endTime - startTime)); // 3275 ms
}

// -Xmx512M -Xms512M -Xmn256M -XX:+AlwaysPreTouch -ea
@Test
public void pressureAtomicLong() throws Exception {
final AtomicLong atomicLong = new AtomicLong();
ExecutorService executorService = Executors.newCachedThreadPool();
long startTime = System.currentTimeMillis();
for (int i = 0; i < 100; i++) {
executorService.submit(new Thread(() -> {
for (int j = 0; j < 1000_0000; j++) {
atomicLong.getAndIncrement();
}
System.out.print(String.format("%s %s \t", Thread.currentThread().getId(), atomicLong.longValue()));
/*
12 390000000 28 390000000 44 390000000 20 390000000 26 390000000 18 390000000
80 390000000 56 390000000 96 390000000 24 390000000 88 390000000 72 390000000
22 390000000 118 390000000 54 390000000 142 390000000 70 390000000 86 390000000
182 390000000 110 390000000 62 390000000 78 390000000 102 390000000 158 390000000
150 390000000 46 390000000 38 390000000 126 390000000 94 390000000 134 390000000
14 390000000 48 390000000 40 390000000 32 390000000 34 390000000 64 390000000
42 390000000 36 390000000 16 390000000 180 416396554 204 419908287 196 425536497
92 732203658 30 733835560 202 733835559 210 733873571 146 733878564 186 733883527
170 733888686 76 733892691 84 733888815 148 733901560 162 733907032 172 733908079
52 733913280 116 733918421 124 733906868 164 733920945 132 733891348 68 733923672
108 733924928 156 733926091 60 733921998 140 733927257 188 733928891 154 733871822
194 733830477 178 733872527 100 733830322 106 748251688 144 1000000000 98 1000000000
58 1000000000 90 1000000000 130 1000000000 138 1000000000 114 1000000000 104 1000000000
168 1000000000 200 1000000000 184 1000000000 160 1000000000 174 1000000000 112 1000000000
190 1000000000 198 1000000000 82 1000000000 206 1000000000 166 1000000000 176 1000000000
136 1000000000 208 1000000000 74 1000000000 122 1000000000 152 1000000000 192 1000000000
120 1000000000 128 1000000000 66 1000000000 50 1000000000
*/
}));
}
executorService.shutdown();
while (!executorService.isTerminated()) {
Thread.sleep(1);
}
long endTime = System.currentTimeMillis();
System.out.println("\n" + (endTime - startTime)); // 19409 ms
}
```
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 42 weeks, 1 day ago 0|i3febj:
ZooKeeper ZOOKEEPER-2789

Reassign `ZXID` for solving 32bit overflow problem

Bug Open Major Unresolved Benedict Jin Benedict Jin Benedict Jin 22/May/17 21:45   14/Dec/19 06:07   3.5.3 3.7.0 quorum   1 7 604800 603600 1200 0% If it is `1k/s` ops, then as long as $2^32 / (86400 * 1000) \approx 49.7$ days ZXID will exhausted. But, if we reassign the `ZXID` into 16bit for `epoch` and 48bit for `counter`, then the problem will not occur until after $Math.min(2^16 / 365, 2^48 / (86400 * 1000 * 365)) \approx Math.min(179.6, 8925.5) = 179.6$ years.

However, i thought the ZXID is `long` type, reading and writing the long type (and `double` type the same) in JVM, is divided into high 32bit and low 32bit part of the operation, and because the `ZXID` variable is not modified with `volatile` and is not boxed for the corresponding reference type (`Long` / `Double`), so it belongs to [non-atomic operation] (https://docs.oracle.com/javase/specs/jls/se8 /html/jls-17.html#jls-17.7). Thus, if the lower 32 bits of the upper 32 bits are divided into the entire 32 bits of the `long`, there may be a concurrent problem.
0% 0% 1200 603600 604800 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 13 weeks, 6 days ago 0|i3fbyn:
ZooKeeper ZOOKEEPER-2788

The define of MAX_CONNECTION_ATTEMPTS in QuorumCnxManager.java seems useless, should it be removed?

Improvement Resolved Minor Fixed Abraham Fine Jiafu Jiang Jiafu Jiang 21/May/17 03:41   01/Jun/17 12:07 31/May/17 23:56 3.4.10, 3.5.3, 3.4.11 3.4.11, 3.5.4, 3.6.0 leaderElection, quorum   0 5   The define of MAX_CONNECTION_ATTEMPTS in QuorumCnxManager.java seems useless, should it be removed? 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 42 weeks ago 0|i3f9kf:
ZooKeeper ZOOKEEPER-2787

CLONE - ZK Shell/Cli re-executes last command on exit

Bug Resolved Major Won't Fix Edward Ribeiro Mostafa Shahdadi Mostafa Shahdadi 20/May/17 19:07   06/Nov/19 13:52 06/Nov/19 13:52   3.4.6, 3.5.0 scripts   0 2   zookeeper-3.4.3 release In the ZK 3.4.3 release's version of zkCli.sh, the last command that was executed is *re*-executed when you {{ctrl+d}} out of the shell. In the snippet below, {{ls}} is executed, and then {{ctrl+d}} is triggered (inserted below to illustrate), the output from {{ls}} appears again, due to the command being re-run.
{noformat}
[zk: zookeeper.example.com:2181(CONNECTED) 0] ls /blah
[foo]
[zk: zookeeper.example.com:2181(CONNECTED) 1] <ctrl+d> [foo]
$
{noformat}
cli, shell, zkcli, zkcli.sh 9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
2 years, 43 weeks, 5 days ago 0|i3f9i7:
ZooKeeper ZOOKEEPER-2786

Flaky test: org.apache.zookeeper.test.ClientTest.testNonExistingOpCode

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 19/May/17 14:33   09/Aug/17 15:26 09/Aug/17 14:31 3.4.10, 3.5.3 3.4.11, 3.5.4, 3.6.0     0 5   This test is broken on 3.4 and 3.5, but is broken in "different" ways. Please see the individual pull requests for detailed descriptions for the issues faced in both branches.

9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 32 weeks, 1 day ago 0|i3f8lb:
ZooKeeper ZOOKEEPER-2785

Server inappropriately throttles connections under load before SASL completes

Bug Resolved Critical Fixed Abhishek Singh Chouhan Abhishek Singh Chouhan Abhishek Singh Chouhan 17/May/17 03:45   23/May/17 22:34 18/May/17 17:15 3.4.10 3.4.11, 3.5.4, 3.6.0 server   0 12   When a zk server is running close to its outstanding requests limit, the server incorrectly throttles the sasl request. This leads to the client waiting for the final sasl packet (session is already established) and deferring all non priming packets till then which also includes the ping packets. The client then waits for the final packet but never gets it and times out saying haven't heard from server. This is fatal for services such as HBase which retry for finite attempts and exit post these attempts.

Issue being that in ZooKeeperServer.processPacket(..) incase of sasl we send the response and incorrectly also call cnxn.incrOutstandingRequests(h), which throttles the connection if we're running over outstandingrequests limit, which results in the server not processing the subsequent packet from the client. Also we donot have any pending request to send for the connection and hence never call enableRecv(). We should return after sending response to the sasl request.
sasl 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 44 weeks ago 0|i3f3lr:
ZooKeeper ZOOKEEPER-2784

Add some limitations on code level for `SID` to avoid configuration problem

Improvement Open Major Unresolved Unassigned Benedict Jin Benedict Jin 16/May/17 00:04   14/Dec/19 06:09   3.5.2 3.7.0 quorum   0 2 604800 604200 600 0% As so far, `QuorumCnxManager#receiveConnection` cannot find out the same `SID` problem, then the Zookeeper cluster will start successfully. But the cluster is not health, and it will throw some problem like `not synchronized`. So, i thought we should add some limitations on code level for `SID` to find those configuration problem more early. 0% 0% 600 604200 604800 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 43 weeks, 3 days ago 0|i3f147:
ZooKeeper ZOOKEEPER-2783

follower disconnects and cannot reconnect

Bug Resolved Major Duplicate Ben Sherman Ben Sherman Ben Sherman 12/May/17 20:43   11/Jun/17 14:59 11/Jun/17 14:59 3.4.10 3.4.11, 3.5.4 leaderElection   0 3   centos 7, AWS EC2 We have a 5 node cluster running 3.4.10 we saw this in .8 and .9 as well), and sometimes, a node gets a read timeout, drops all the connections and tries to re-establish itself to the quorum. It can usually do this in a few seconds, but last night it took almost 15 minutes to reconnect.

These are 5 servers in AWS, and we've tried tuning the timeouts, but the are exceeding any reasonable timeout and still failing.

In the attached logs, 5 is a follower, 3 is the leader. 5 loses connectivity at 11:21:34. 3 sees the disconnect at the same moment.

5 tries to re-establish the quorum, but cannot do it until the connections to the other servers expire at 11:37:02. After the connections are re-established, 5 connects immediately.

At 11:41:08, the operator restarted the server, and it reconnected normally.

I suspect there is a problem with stale connections to the rest of the quorum - the other services on this box were fine (monitoring, puppet) and able to establish new connections with no problems.

I posed this problem to the zookeeper-users list and was asked to open a ticket.
9223372036854775807 No Perforce job exists for this issue. 2 9223372036854775807
2 years, 40 weeks, 4 days ago 0|i3exfz:
ZooKeeper ZOOKEEPER-2782

Cannot log in

Wish Open Trivial Unresolved Unassigned test test 12/May/17 09:31   12/May/17 09:31           0 1   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 44 weeks, 6 days ago 0|i3ewcn:
ZooKeeper ZOOKEEPER-2781

ZOOKEEPER-3170 Flaky test: testClientAuthAgainstNoAuthServerWithLowerSid

Sub-task Resolved Major Cannot Reproduce Andor Molnar Abraham Fine Abraham Fine 11/May/17 21:25   30/Jan/19 11:08 25/Oct/18 11:15 3.4.10       0 5 0 600   Here is an example failing job: https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1489/ 100% 100% 600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 21 weeks ago 0|i3evkn:
ZooKeeper ZOOKEEPER-2780

if directories of connectString don't exist, then...

Improvement Open Major Unresolved Unassigned Xiaoshuang LU Xiaoshuang LU 11/May/17 05:11   11/May/17 05:30       java client   0 1   {code}
public static void main(String[] stringArray) {
try {
// None of a, b, and c exist.
// Can I create them with the following ZooKeeper object?
ZooKeeper zooKeeper =
new ZooKeeper(
"address1:port1,address2:port2,address3:port3,address4:port4,address5:port5/a/b/c",
60000,
null);

zooKeeper.create("/d", null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);

zooKeeper.close();
} catch (Exception e) {
LOGGER.error("", e);
}
}
{code}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 45 weeks ago 0|i3etof:
ZooKeeper ZOOKEEPER-2779

Add option to not set ACL for reconfig node

Improvement Open Major Unresolved Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 09/May/17 10:03   05/Feb/20 07:16   3.5.3 3.7.0, 3.5.8 server   0 5 0 3000   ZOOKEEPER-2014 changed the behavior of the /zookeeper/config node by setting the ACL to {{ZooDefs.Ids.READ_ACL_UNSAFE}}. This change makes it very cumbersome to use the reconfig APIs. It also, perversely, makes security worse as the entire ZooKeeper instance must be opened to "super" user while enabled reconfig (per {{ReconfigExceptionTest.java}}). Provide a mechanism for savvy users to disable this ACL so that an application-specific custom ACL can be set. 100% 100% 3000 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 42 weeks ago 0|i3epjj:
ZooKeeper ZOOKEEPER-2778

Potential server deadlock between follower sync with leader and follower receiving external connection requests.

Bug Closed Blocker Fixed Michael K. Edwards Michael Han Michael Han 06/May/17 01:08   04/Oct/19 10:55 07/Dec/18 07:12 3.5.3 3.6.0, 3.5.5 quorum   0 8 0 31800   It's possible to have a deadlock during recovery phase.
Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest [1]. . Here is a sample thread dump that illustrates the state of the execution:

{noformat}
[junit] java.lang.Thread.State: BLOCKED
[junit] at org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
[junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
[junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
[junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
[junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
[junit]


[junit] java.lang.Thread.State: BLOCKED
[junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
[junit] at org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
[junit] at org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
[junit] at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
[junit] at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
[junit] at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
{noformat}

The dead lock happens between the quorum peer thread which running the follower that doing sync with leader work, and the listener of the qcm of the same quorum peer that doing the receiving connection work. Basically to finish sync with leader, the follower needs to synchronize on both QV_LOCK and the qmc object it owns; while in the receiver thread to finish setup an incoming connection the thread needs to synchronize on both the qcm object the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here is the order of acquiring two locks are different, thus depends on timing / actual execution order, two threads might end up acquiring one lock while holding another.

[1] org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig
100% 100% 31800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 14 weeks, 6 days ago 0|i3ekp3:
ZooKeeper ZOOKEEPER-2777

There is a typo in zk.py which prevents from using/compiling it.

Bug Resolved Major Fixed Nikhil Bhide Frederic Leger Frederic Leger 05/May/17 11:09   11/Sep/17 18:32 11/Sep/17 17:29 3.4.10 3.4.11, 3.5.4, 3.6.0 contrib   1 8 3600 3600 0% Linux While trying to create an RPM from zookeeper 3.4.10, I got an error when it tried to compile the file :

zookeeper/contrib/zkpython/src/python/zk.py", line 55
"""Pretty print(a zookeeper tree, starting at root""")
^
SyntaxError: invalid syntax
0% 0% 3600 3600 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 27 weeks, 3 days ago 0|i3ejqf:
ZooKeeper ZOOKEEPER-2776

Election Failed when (medium id) node down

Bug Open Major Unresolved Unassigned chenbo chenbo 05/May/17 04:30   05/May/17 04:30   3.4.6   leaderElection   0 3   We found a bug of zookeeper election when used in our environment. It could be simply reproduced in 3 nodes cluster with default settings.
# Assume zookeeper services down on all nodes and node 3 has bigger zxid than node1. this makes node 3 a potential leader.
# Make node 2 down (or drop all incoming packages by firewall).
# Start zookeeper services on node 1 and node 3.

Zookeeper cluster cannot be successfully established in such a case. The following logs could be found and verified:
# Notifications to node 2 always times out.
# node 3 is always leading but always failed because (Timeout while waiting for epoch from quorum). It rarely get Follower during the period.
# node 1 is always following but always failed to connect Leader. it gives up after tried for 5 times and then another round election started again and again.
# the time node 3 decided to be a leader is 1s after node 1 giving up contacting it.
# node 3 always receive Notification packages 5s after node 1.

Then we analyzed source code of zookeeper-3.4.6 and found:
# In election, Zookeeper send leader election message sequentially and has connection timeout 5s by default. This makes a 5s recv delay for nodes after (by id) the down node. Those nodes will get the same election notification 5s after those nodes which have smaller id than the down node.

In the case mentioned above, node 3 realized the situation and jumped into LEADING status 5s after node 1 decided to follow it. For follower node 1, it tried to connect leader 5 attempts with 1s interval (hard-coded). This means all followers give up connecting leader after 4s. At the time when follower gave up, the node 3 has not even become the leader.

-- So, Is there any solution to configure or bypass this problem?
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 45 weeks, 6 days ago 0|i3ej0n:
ZooKeeper ZOOKEEPER-2775

ZK Client not able to connect with Xid out of order error

Bug Resolved Critical Fixed Mohammad Arshad Bhupendra Kumar Jain Bhupendra Kumar Jain 05/May/17 02:12   25/Jan/18 17:07 12/Jun/17 19:18 3.4.10, 3.5.3, 3.6.0 3.4.11, 3.5.4, 3.6.0 java client   0 8  
During Network unreachable scenario in one of the cluster, we observed Xid out of order and Nothing in the queue error continously. And ZK client it finally not able to connect successully to ZK server.

*Logs:*

unexpected error, closing socket connection and attempting reconnect | org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447)
java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 for a packet with details: clientPath:null serverPath:null finished:false header:: 53,101 replyHeader:: 0,0,-4 request:: 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} response:: null
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)

unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Nothing in the queue, but got 1
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)

*Analysis:*
1) First time Client fails to do SASL login due to network unreachable problem.
2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | SASL configuration failed: javax.security.auth.login.LoginException: Network is unreachable (sendto failed) Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. | org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307)
Here the boolean saslLoginFailed becomes true.

2) After some time network connection is recovered and client is successully able to login but still the boolean saslLoginFailed is not reset to false.

3) Now SASL negotiation between client and server start happening and during this time no user request will be sent. ( As the socket channel will be closed for write till sasl negotiation complets)
4) Now response from server for SASL packet will be processed by the client and client assumes that tunnelAuthInProgress() is finished ( method checks for saslLoginFailed boolean Since the boolean is true it assumes its done.) and tries to process the packet as a other packet and will result in above errors.

*Solution:* Reset the saslLoginFailed boolean every time before client login
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
2 years, 8 weeks ago 0|i3eiuv:
ZooKeeper ZOOKEEPER-2774

Ephemeral znode will not be removed when sesstion timeout, if the system time of ZooKeeper node changes unexpectedly.

Bug Resolved Major Fixed Jiafu Jiang Jiafu Jiang Jiafu Jiang 03/May/17 22:28   12/Jul/17 22:59 18/May/17 17:31 3.4.8, 3.4.9, 3.4.10 3.4.11 server   1 5   Centos6.5 1. Deploy a ZooKeeper cluster with one node.
2. Create a Ephemeral znode.
3. Change the system time of the ZooKeeper node to a earlier point.
4. Disconnect the client with the ZooKeeper server.

Then the ephemeral znode will exist for a long time even when session timeout.

I have read the ZooKeeper source code and I find the code int SessionTrackerImpl.java,
{code:title=SessionTrackerImpl.java|borderStyle=solid}
@Override
synchronized public void run() {
try {
while (running) {
currentTime = System.currentTimeMillis();
if (nextExpirationTime > currentTime) {
this.wait(nextExpirationTime - currentTime);
continue;
}
SessionSet set;
set = sessionSets.remove(nextExpirationTime);
if (set != null) {
for (SessionImpl s : set.sessions) {
setSessionClosing(s.sessionId);
expirer.expire(s);
}
}
nextExpirationTime += expirationInterval;
}
} catch (InterruptedException e) {
handleException(this.getName(), e);
}
LOG.info("SessionTrackerImpl exited loop!");
}
{code}

I think it may be better to use System.nanoTime(), not System.currentTimeMillis, because the later can be changed manually or automatically by a NTP client.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 43 weeks, 6 days ago 0|i3egs7:
ZooKeeper ZOOKEEPER-2773

zookeeper-service

Bug Open Major Unresolved Unassigned Ashwath Ashwath 03/May/17 00:35   18/Jan/18 03:54   3.4.10 3.4.10 quorum   0 6   Linux Hi
I run zookeeper in 3 Linux Machines.

1.I downloaded zookeeper-3.4.10.jar file and extracted that.
2.I copy zoo_sample to zoo.cfg and edited datadir and added 3 ip address.
3.I created a new file called myid and insert numbers into that.
Now I am running zookeeper cluster successfully..but

When I am trying to run it as a service I am getting following error

zookeeper.service - Apache ZooKeeper
Loaded: loaded (/lib/systemd/system/zookeeper.service; disabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2017-05-03 09:56:28 IST; 1s ago
Process: 678 ExecStart=/home/melon/software/ZooKeeper/zk/bin/zkServer.sh start-foreground (code=exited
Main PID: 678 (code=exited, status=127)

May 03 09:56:28 deds14 systemd[1]: zookeeper.service: Unit entered failed state.
May 03 09:56:28 deds14 systemd[1]: zookeeper.service: Failed with result 'exit-code'.

Here the code I added

Unit]
Description=Apache ZooKeeper
After=network.target
ConditionPathExists=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf/zoo.cfg
ConditionPathExists=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf/log4j.properties

[Service]
Environment="ZOOCFGDIR=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf"
SyslogIdentifier=zookeeper
WorkingDirectory=/home/melon/software/ZooKeeper
ExecStart=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/bin/zkServer.sh start-foreground
Restart=on-failure
RestartSec=20
User=root
Group=root

Thank you
beginner 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 9 weeks ago 0|i3eei7:
ZooKeeper ZOOKEEPER-2772

Delete node command does not honor Acl policy

Bug Resolved Major Not A Bug Unassigned joe smith joe smith 02/May/17 17:19   17/May/17 10:32 17/May/17 10:32 3.4.8, 3.4.10   security   0 3   I set the acl to not be able to delete a node - but was able to delete regardless.

I am not familiar with the code, but a reply from Martin in the user@ mailing list seems to confirm the issue. I will paste his response below - sorry for the long listing.

Martin's reply are inline prefixed with: MG>

----------
From: joe smith <water4u99@yahoo.com.INVALID>
Sent: Tuesday, May 2, 2017 8:40 AM
To: user@zookeeper.apache.org
Subject: Acl block detete not working

Hi,
I'm using 3.4.10 and setting custom aol to block deletion of a znode. However, I'm able to delete the node even after I've set acl from cdrwa to cra.

Can anyone point out if I missed some step.

Thanks for the help

Here is the trace:
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]

[zk: localhost:2181(CONNECTED) 1] create /test "data"
Created /test

[zk: localhost:2181(CONNECTED) 2] ls /
[zookeeper, test]

[zk: localhost:2181(CONNECTED) 3] addauth myfqdn localhost
[zk: localhost:2181(CONNECTED) 4] setAcl /test myfqdn:localhost:cra
cZxid = 0x2
ctime = Tue May 02 08:28:42 EDT 2017
mZxid = 0x2
mtime = Tue May 02 08:28:42 EDT 2017
pZxid = 0x2
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0

MG>in SetAclCommand you can see the acl being parsed and acl being set by setAcl into zk object

List<ACL> acl = AclParser.parse(aclStr);
int version;
if (cl.hasOption("v")) {
version = Integer.parseInt(cl.getOptionValue("v"));
} else {
version = -1;
}
try {
Stat stat = zk.setACL(path, acl, version);

MG>later on in DeleteCommand there is no check for aforementioned acl parameter
public boolean exec() throws KeeperException, InterruptedException {
String path = args[1];
int version;
if (cl.hasOption("v")) {
version = Integer.parseInt(cl.getOptionValue("v"));
} else {
version = -1;
}

try {
zk.delete(path, version);
} catch(KeeperException.BadVersionException ex) {
err.println(ex.getMessage());
}
return false;

MG>as seen here the testCase works properly saving the Zookeeper object
LsCommand entity = new LsCommand();
entity.setZk(zk);


MG>but setACL does not save the zookeeper object anywhere but instead seems to discard zookeeper object with accompanying ACLs

MG>can you report this bug to Zookeeper?
https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

ZooKeeper - ASF JIRA - issues.apache.org<https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel>
issues.apache.org
Apache ZooKeeper is a service for coordinating processes of distributed applications. Versions: Unreleased. Name Release date; Unreleased 3.2.3 : Unreleased 3.3.7

MG>Thanks Joe!

[zk: localhost:2181(CONNECTED) 5] getAcl /test
'myfqdn,'localhost
: cra

[zk: localhost:2181(CONNECTED) 6] get /testdata
cZxid = 0x2
ctime = Tue May 02 08:28:42 EDT 2017
mZxid = 0x2
mtime = Tue May 02 08:28:42 EDT 2017
pZxid = 0x2
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0

[zk: localhost:2181(CONNECTED) 7] set /test "testwrite"
Authentication is not valid : /test

[zk: localhost:2181(CONNECTED) 8] delete /test
[zk: localhost:2181(CONNECTED) 9] ls /
[zookeeper]

[zk: localhost:2181(CONNECTED) 10]
The auth provider imple is here: http://s000.tinyupload.com/?file_id=42827186839577179157
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 44 weeks, 5 days ago 0|i3ee0v:
ZooKeeper ZOOKEEPER-2771

all resolved bug

Bug Open Major Unresolved Unassigned masoud rezai masoud rezai 02/May/17 04:15   02/May/17 04:15           0 2   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 46 weeks, 2 days ago 0|i3ecq7:
ZooKeeper ZOOKEEPER-2770

ZooKeeper slow operation log

Improvement Patch Available Major Unresolved Karan Mehta Karan Mehta Karan Mehta 01/May/17 19:03   03/Jun/18 18:15           0 7 0 9600   ZooKeeper is a complex distributed application. There are many reasons why any given read or write operation may become slow: a software bug, a protocol problem, a hardware issue with the commit log(s), a network issue. If the problem is constant it is trivial to come to an understanding of the cause. However in order to diagnose intermittent problems we often don't know where, or when, to begin looking. We need some sort of timestamped indication of the problem. Although ZooKeeper is not a datastore, it does persist data, and can suffer intermittent performance degradation, and should consider implementing a 'slow query' log, a feature very common to services which persist information on behalf of clients which may be sensitive to latency while waiting for confirmation of successful persistence.

Log the client and request details if the server discovers, when finally processing the request, that the current time minus arrival time of the request is beyond a configured threshold.

Look at the HBase {{responseTooSlow}} feature for inspiration.
100% 100% 9600 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 3 9223372036854775807
1 year, 44 weeks, 4 days ago 0|i3ec8f:
ZooKeeper ZOOKEEPER-2769

Compiled documentation should not be under source control

Bug Resolved Major Not A Bug Andor Molnar Abraham Fine Abraham Fine 28/Apr/17 15:44   14/Oct/17 00:42 13/Oct/17 18:58         0 3   We have xml files that compile into our documentation in src/docs/ and precompiled documentation docs/

We should remove them and only have uncompiled documentation under source control
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 22 weeks, 5 days ago 0|i3e98v:
ZooKeeper ZOOKEEPER-2768

Some ideas about four word command

Improvement Open Minor Unresolved Unassigned 王震 王震 28/Apr/17 05:57   11/May/17 21:50   3.4.10, 3.5.1, 3.5.2   contrib   0 2   Some ideas about four word command
1) about cons,can we add command dimension data,such like
now
/10.204.2.39:63943[1](queued=0,recved=7,sent=7,sid=0x154c32e8c2a5b8c,lop=PING,est=1483669807748,
to=10000,lzxid=0xffffffffffffffff,lresp=1493362823544,llat=0,minlat=0,avglat=0,maxlat=1)
-----------------------------------
after
/10.204.2.39:63943[1](queued=0,recved=7,sent=7,sid=0x154c32e8c2a5b8c,lop=PING,est=1483669807748,
to=10000,lzxid=0xffffffffffffffff,lresp=1493362823544,llat=0,minlat=0,avglat=0,maxlat=1,
cmd={{op=ping,count=10000,time=123405,maxTime=34},{op=setData,count=5000,time=2246,maxTime=21},{op=getData,count=3000,time=34345,maxTime=14}})

2) about wchc and wchp,can we add param in order to return litter data,such as
wchc 0x154c32e8c2a5b8c
wchp /path/temp

3)many scenarios we need to monitor the detailed slow request,so we need a slow log queue,such as

slow

setData /path/temp aaaaaaaaaaaaaaa clientIp useTime
setData /path/temp bbbbbbbbbbbbbbb clientIp useTime


9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 44 weeks, 6 days ago 0|i3e8e7:
ZooKeeper ZOOKEEPER-2767

Correct the exception messages in X509Util if truststore location or password is not configured

Improvement Resolved Trivial Fixed Abhishek Kumar Abhishek Kumar Abhishek Kumar 27/Apr/17 08:14   12/Jul/17 22:59 26/May/17 19:04 3.5.4, 3.6.0 3.5.4, 3.6.0 java client, server   0 5  
In org.apache.zookeeper.common.X509Util.org.apache.zookeeper.common.X509Util.createSSLContext exception messages contains keystore related messages instead of truststore messages for truststore location/password checks:
{noformat}
if (trustStoreLocationProp == null && trustStorePasswordProp == null) {
LOG.warn("keystore not specified for client connection");
} else {
if (trustStoreLocationProp == null) {
throw new SSLContextException("keystore location not specified for client connection");
}
if (trustStorePasswordProp == null) {
throw new SSLContextException("keystore password not specified for client connection");
}
try {
trustManagers = new TrustManager[]{
createTrustManager(trustStoreLocationProp, trustStorePasswordProp)};
} catch (TrustManagerException e) {
throw new SSLContextException("Failed to create KeyManager", e);
}
}
{noformat}
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 42 weeks, 6 days ago 0|i3e6jb:
ZooKeeper ZOOKEEPER-2766

Quorum fails with java.io.EOFException

Bug Open Major Unresolved Unassigned Patrick Kleindienst Patrick Kleindienst 25/Apr/17 06:18   19/Jul/17 10:39   3.5.3   leaderElection, quorum   0 4   CentOS-7, Docker version 17.03.1-ce When I start a ZooKeeper ensemble comprising 3 nodes, I'm currently facing the following behavior:
Two nodes (let's say node 2 and 3) out of the three start their own quorum, and finally one of them is elected the new leader (node 3), while the other one becomes the follower (node 2). Node 1 seems to be able to establish a connection to node 3 (elected leader) in my case, but this seems to fail for node 2.
Node 1 shows the following in its logs:

2017-04-25 09:24:02,806 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1055] - LOOKING
2017-04-25 09:24:02,808 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FastLeaderElection@894] - New election. My id = 1, proposed zxid=0x0
2017-04-25 09:24:02,811 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@688] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (
n.peerEPoch), LOOKING (my state)0 (n.config version)
2017-04-25 09:24:02,817 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@457] - Cannot open channel to 2 at election address /9.152.171.98:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:443)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:486)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:421)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:486)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:465)
at java.lang.Thread.run(Thread.java:745)
2017-04-25 09:24:02,822 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@278] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-25 09:24:03,025 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumCnxManager@457] - Cannot open channel to 2 at election address /9.152.171.98:3888

However, that's not all, since the quorum consisting of node 2 and 3 does not work properly. The nodes' logs tell that leader election between these two works fine.
Here's what node 3 (leader) says:

2017-04-25 09:09:33,842 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2017-04-25 09:09:33,844 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@688] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2017-04-25 09:09:33,851 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2017-04-25 09:09:34,051 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3,name2=LeaderElection]
2017-04-25 09:09:34,052 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1143] - LEADING
2017-04-25 09:09:34,055 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@63] - TCP NoDelay set to: true
2017-04-25 09:09:34,055 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@83] - zookeeper.leader.maxConcurrentSnapshots = 10
2017-04-25 09:09:34,056 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@85] - zookeeper.leader.maxConcurrentSnapshotTimeout = 5


And here's the output node 2 (follower) provides:

2017-04-25 09:09:31,875 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (
n.peerEPoch), LOOKING (my state)0 (n.config version)
2017-04-25 09:09:32,077 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=repl
ica.2,name2=LeaderElection]
2017-04-25 09:09:32,077 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1131] - FOLLOWING
2017-04-25 09:09:32,082 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@88] - TCP NoDelay set to: true

So far, so good. But seconds later the connection between node 2 and 3 seems to get lost, causing leader node 3 to report an EOFExeption. If I understand the logs correctly, node 2 (follower) properly closes the connection (sending "Goodbye"), whilst node 3 says that the socket is still open.

2017-04-25 09:09:34,190 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@414] - LEADING - LEADER ELECTION TOOK - 138 MS
2017-04-25 09:09:34,197 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileTxnSnapLog@320] - Snapshotting: 0x0 to /data/version-2/snapshot.0
2017-04-25 09:09:35,076 [myid:3] - INFO [LearnerHandler-/9.152.171.98:51328:LearnerHandler@382] - Follower sid: 2 : info : 9.152.171.98:2888:3888:participant;0.0.0.0:2181
2017-04-25 09:09:35,113 [myid:3] - INFO [LearnerHandler-/9.152.171.98:51328:LearnerHandler@683] - Synchronizing with Follower sid: 2 maxCommittedLog=0x0 minCommittedLog=0x0 lastProcessedZxid=0x0 peerLastZxid=0x
0
2017-04-25 09:09:35,114 [myid:3] - INFO [LearnerHandler-/9.152.171.98:51328:LearnerHandler@727] - Sending DIFF zxid=0x0 for peer sid: 2
2017-04-25 09:09:35,133 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@1258] - Have quorum of supporters, sids: [ [2, 3],[2, 3] ]; starting up and setting last processed zxid: 0x100000000
2017-04-25 09:09:35,169 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):CommitProcessor@255] - Configuring CommitProcessor with 2 worker threads.
2017-04-25 09:09:35,179 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ContainerManager@64] - Using checkIntervalMs=60000 maxPerMinute=10000
2017-04-25 09:09:35,196 [myid:3] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@919] - Connection broken for id 2, my id = 3, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:904)
2017-04-25 09:09:35,196 [myid:3] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@922] - Interrupting SendWorker
2017-04-25 09:09:35,197 [myid:3] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@836] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:986)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:824)
2017-04-25 09:09:35,197 [myid:3] - WARN [SendWorker:2:QuorumCnxManager$SendWorker@845] - Send worker leaving thread id 2 my id = 3
2017-04-25 09:09:35,204 [myid:3] - ERROR [LearnerHandler-/9.152.171.98:51328:LearnerHandler@604] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:515)
2017-04-25 09:09:35,204 [myid:3] - WARN [LearnerHandler-/9.152.171.98:51328:LearnerHandler@619] - ******* GOODBYE /9.152.171.98:51328 ********
2017-04-25 09:09:37,181 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@626] - Shutting down
2017-04-25 09:09:37,182 [myid:3] - INFO [QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Leader@632] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ [3] ]
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:632)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:612)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1146)

Unfortunately, node 2 does not provide any additional information on what exactly is going on. After leader election, the only thing it reports is this:

2017-04-25 09:09:32,091 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@68] - FOLLOWING - LEADER ELECTION TOOK - 13 MS
2017-04-25 09:09:32,094 [myid:2] - WARN [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@273] - Unexpected exception, tries=0, remaining init limit=9999, connecting to /9.152.171.12:288
8
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
2017-04-25 09:09:33,142 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@369] - Getting a diff from the leader 0x0
2017-04-25 09:09:33,146 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@516] - Learner received NEWLEADER message
2017-04-25 09:09:33,207 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@499] - Learner received UPTODATE message
2017-04-25 09:09:33,220 [myid:2] - WARN [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1446] - Restarting Leader Election
2017-04-25 09:09:33,221 [myid:2] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener@665] - Leaving listener
2017-04-25 09:09:33,222 [myid:2] - WARN [SendWorker:3:QuorumCnxManager$SendWorker@836] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:986)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:824)
2017-04-25 09:09:33,222 [myid:2] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@919] - Connection broken for id 3, my id = 2, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:224)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:904)

As far as I get that, node 2 wants to start a new leader election, but fails to establish a connection to the other nodes. It tries over and over, finally ending up in a timeout. Unfortunately, this doesn't give me any hint on what exactly breaks up the connection between the follower node (node 2) and the leader node (node 3) and why it can be re-established.

It might also be relevant that I'm running ZooKeeper in Docker containers, using the host network option.
9223372036854775807 No Perforce job exists for this issue. 3 9223372036854775807
2 years, 35 weeks, 1 day ago 0|i3e267:
ZooKeeper ZOOKEEPER-2765

modern C++ client

New Feature Resolved Major Won't Fix Edward Carter Edward Carter Edward Carter 19/Apr/17 11:43   30/Jan/19 16:49 30/Jan/19 11:15     c client   0 5 0 2400   We should add a modern C++ (i.e. C++14, C++17, etc.) client library that wraps the existing C client. A future issue may replace the C client itself. 100% 100% 2400 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
1 year, 7 weeks, 1 day ago 0|i3dthr:
ZooKeeper ZOOKEEPER-2764

By default, only srvr four-letter word is on the whitelist, while documentation says all are

Bug Closed Minor Not A Bug Unassigned Arne Bachmann Arne Bachmann 19/Apr/17 05:54   17/May/17 23:43 20/Apr/17 04:10 3.5.3 3.5.3     0 1   Using the same Vagrant provisioning script as for 3.5.2-alpha, suddenly all monitoring tools told me that the ZK instance was unavailable or had an error. Investigating further, the instance was fine as a follower, but the response to telnet "ruok" was actually "ruok ... is not in the whitelist".
Is this a new default not reflected in the documentation yet? It says since 3.4.10 there's a whitelist option, but all commands are by default on it (same as 4lw.commands.whitelist=*).
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 48 weeks ago 0|i3dswv:
ZooKeeper ZOOKEEPER-2763

Utils.toCsvBuffer() omits leading 0 for bytes < 0x10

Bug Resolved Minor Won't Fix Alburt Hoffman Brandon Berg Brandon Berg 19/Apr/17 02:34   08/Aug/19 13:13 08/Aug/19 13:13 3.5.2   jute   0 6 0 1800   org.apache.jute.Utils.toCsvBuffer(), which converts a byte array to a string containing the hex representation of that byte array, omits the leading zero for any byte less than 0x10, due to its use of Integer.toHexString, which has the same behavior.

https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/jute/Utils.java#L234

One consequence of this is that the hex strings printed by ClientCnxn.Packet.toString(), used in the debug logging for ClientCnxn.readResponse(), cannot be parsed to determine the result of a Zookeeper request from client debug logs.

Utils.toXmlBuffer() appears to have the same issue.
{code}
100% 100% 1800 0 pull-request-available 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
Patch
2 years, 31 weeks, 1 day ago 0|i3dsmf:
ZooKeeper ZOOKEEPER-2762

ZOOKEEPER-2728 Multithreaded correctness Warnings

Sub-task Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 18/Apr/17 17:57   24/May/17 23:45 24/May/17 23:03   3.4.11     0 3   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 43 weeks ago 0|i3ds0n:
ZooKeeper ZOOKEEPER-2761

Build is packaged as uncompressed tar archive but file name ends with .gz

Bug Open Major Unresolved Unassigned Arne Bachmann Arne Bachmann 18/Apr/17 07:31   23/Jan/20 13:16   3.5.3   build   1 5   Windows, Ubuntu Linux This breaks my build scripts. Did work fine with 3.5.2-alpha

Using 7-Zip on Windows I got a warning, but the archive was extracted fine.

On Linux, tar -xzf exits with an error code, as it pipes through gunzip, which encounters an invalid file (seems to be a pure tar archive).

Hence the huge file size (?)
build, easyfix 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 48 weeks, 2 days ago 0|i3dqmv:
ZooKeeper ZOOKEEPER-2760

AAch64 build error: Error: unknown mnemonic `lock' -- `lock xaddl x1,[x0]'

Bug Resolved Blocker Duplicate Unassigned Yuqi Gu Yuqi Gu 18/Apr/17 03:49   07/Sep/18 03:13 07/Sep/18 03:13 3.4.10 3.4.10 build, c client   0 2   Hisilicon Taishan AArch64(Cortex-A57@2.1GHz)
16.04.2 LTS (Xenial Xerus)
Zookeeper-3.4.10 is integrated into Apache Bigtop :
https://github.com/apache/bigtop/commit/b00ac093634437e749561c8837179d13d95fda91

But compiling error occurred when we build the bigtop zookeeper component on AArch64 :

"[exec] libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/ws/output/zookeeper/zookeeper-3.4.10/src/c -I/ws/output/zookeeper/zookeeper-3.4.10/src/c/include -I/ws/output/zookeeper/zookeeper-3.4.10/src/c/tests -I/ws/output/zookeeper/zookeeper-3.4.10/src/c/generated -Wdate-time -D_FORTIFY_SOURCE=2 -DTHREADED -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF .deps/libzkmt_la-mt_adaptor.Tpo -c /ws/output/zookeeper/zookeeper-3.4.10/src/c/src/mt_adaptor.c -fPIC -DPIC -o .libs/libzkmt_la-mt_adaptor.o
[exec] Makefile:946: recipe for target 'libzkmt_la-mt_adaptor.lo' failed
[exec] make[2]: Leaving directory '/ws/output/zookeeper/zookeeper-3.4.10/build/c'
[exec] /tmp/cc4YHZ73.s: Assembler messages:
[exec] /tmp/cc4YHZ73.s:1713: Error: unknown mnemonic lock' --lock xaddl x1,[x0]'
[exec] make[2]: *** [libzkmt_la-mt_adaptor.lo] Error 1
"
9223372036854775807 No Perforce job exists for this issue. 1 9223372036854775807
1 year, 27 weeks, 6 days ago 0|i3dq9b:
ZooKeeper ZOOKEEPER-2759

Flaky test: org.apache.zookeeper.server.quorum.QuorumCnxManagerTest.testNoAuthLearnerConnectToAuthRequiredServerWithHigherSid

Bug Resolved Major Fixed Abraham Fine Abraham Fine Abraham Fine 17/Apr/17 18:45   21/Jul/17 11:32 27/Apr/17 17:15 3.4.10 3.4.11     0 4   9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 46 weeks, 6 days ago 0|i3dppz:
ZooKeeper ZOOKEEPER-2758

Typo: transasction --> transaction

Bug Resolved Trivial Fixed Jeff Widman Jeff Widman Jeff Widman 15/Apr/17 02:30   12/Jul/17 23:03 24/Apr/17 13:10   3.4.11, 3.5.4, 3.6.0     0 5   Typo in src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml 9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 47 weeks, 2 days ago 0|i3dnp3:
ZooKeeper ZOOKEEPER-2757

Incorrect path crashes zkCli

Bug Resolved Minor Fixed Abraham Fine Flavio Paiva Junqueira Flavio Paiva Junqueira 14/Apr/17 17:58   26/May/17 19:44 26/May/17 18:52 3.5.3 3.5.4, 3.6.0     0 5   If I try {{delete test}} without the leading /, then the CLI crashes with this exception:

{noformat}
Exception in thread "main" java.lang.IllegalArgumentException: Path must start with / character
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:1659)
at org.apache.zookeeper.cli.DeleteCommand.exec(DeleteCommand.java:83)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:655)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:586)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:370)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:330)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:292)
{noformat}

It should really fail the operation rather than crash the CLI.
9223372036854775807 No Perforce job exists for this issue. 0 9223372036854775807
2 years, 42 weeks, 6 days ago 0|i3dngf:
Generated at Fri Mar 20 00:35:18 UTC 2020 by Song Xu using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.